DOCUMENT RESUME 



ED 058 318 



TM 001 034 



AUTHOR 

TITLE 

INSTITUTION 
SPONS AGENCY 

REPORT NO 
PUB DATE 
NOTE 



Echternacht, Gary J. ; And Others 
An Evaluation of the Feasibility of Confidence 
Testing as a Diagnostic Aid in Technical Training. 
Educational Testing Service, Princeton, N«J. 

Air Force Human Resources Lab., Wright-Patterson AFB, 
Ohio. 

AFHRL-TR-71-33 

Jul 71 

139p. 



EDRS PRICE MF-$0.65 HC-$6.58 

DESCRIPTORS *Confidence Testing; ♦Cost Effectiveness; Educational 

Diagnosis; Evaluation; ♦Feasibility Studies; Grades 
(Scholastic); Guessing (Tests); Item Analysis; *Job 
Training; Multiple Choice Tests; Personality 
Assessment; Psychometrics; Questionnaires; Scoring 
Formulas; Student Attitudes; Student Opinion; Teacher 
Attitudes; ♦Technical Education; Testing Programs 



ABSTRACT 

The feasibility and the cost-effectiveness of using 
confidence testing as a diagnostic aid in technical training programs 
were studied. Two types of confidence testing, Pick-One and 
Distribute 100 Points, were developed for comparison to conventional 
multi pie- choice testing. The criteria for feasibility included end of 
block examination grades, number of student remediational sessions, 
and both student and instructor attitudes. In addition, the 
relationship of various personality variables to confidence test ^ 
scores was examined for both types of confidence testing. The major 
finding was that while scoring was somewhat more time consuming, end 
of block examination grades improved slightly and the number of 
remediations required declined slightly when either confidence 
testing method was employed. Other areas of investigation produced 
essentially null results. Copies of the Student Attitude 
Questionnaire and the Instructor Questionnaire are appended. 
(Author/MS) 



EDO 5831 8 



UR FORCE# 



U.S. DEPARTMENT OF HEALTH, 
EDUCATION & WELFARE 
OFFICE, OF EDUCATION 
THIS DOCUMENT HAS BEEN REPRO- 
DUCED EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATION ORIG- 
INATING IT. POINTS OF VIEW OR OPIN- 
IONS STATED DO NOT NECESSARILY 
REPRESENT OFFICIAL OFFICE OF EDU- 
CATION POSITION OR POLICY. 



AFHRL-TR-71-33 




tl; 






pH 



t: 

lo 






<V-/ 

V*- 



O 

ERIC 







H 

U 

M 

A 

N 

R 

E 

S 

0 

U 

R 

C 

E 

S 



AN EVALUATION OF THE FEASIBILITY 
OF CONFIDENCE TESTING AS A DIAGNOSTIC AID 
IN TECHNICAL TRAINING 



By 

Gary J. Echternacht 

Educational Testing Service 

Wayne S. Sellman, Capt, USAF 

Technical Training Division (AFHRL) 

Robert F. Boldt 

Educational Testing Service 

Joseph D. Young, Capt, USAF 

334Sth Technical School 



TECHNICAL TRAINING DIVISION 
Lowry Air Force Base, Colorado 



July 1971 



Approved for public rclc.isc; distribution unlimited. 



LABORATORY 




AFPS SA 



AIR FORCE SYSTEMS COMMAND 

BROOKS AIR FORCE BASE, TEXAS 

1 



NOTICE 



When US Government drawings, specifications, or other data arc used 
for any purpose other than a definitely related Government 
procurement operation, the Government thereby incurs no 
responsibility nor any obligation whatsoever, and the fact tliat the 
Government may have formulated, furnished, or in any way supplied 
the said drawings, specifications, or other data is not to be regarded by 
implication or otherwise, as in any manner licensing the holder or any 
other person or corporation, or conveying any rights or permission to 
manufacture, use, or sell any patented invention that may in any way 
be related thereto. 



AFHRL-TR-71-33 



July 1971 



r 

I 

i, 

■f 

t 

j. 

V 

i 

V 
1 



I 



! 



f 

! 

i 

I 

{ 



i 



! 



V 

I 

/ 

I 

I 

i 

i:' 

t . 

i' 

i 

}' 



t. 

V 



I 

i*! 



4 



t 



AN EVALUATION OF THE FEASIBILITY OF CONFIDENCE TESTING 
AS A DIAGNOSTIC AID IN TECHNICAL TRAINING 



By 

Gary J. Echternacht 

Educational Testing Service 

Wayne S. Sellman, Capt, USAF 

Technical Training Division (AFHRL) 

Robert F. Boldt 

Educational Testing Service 

Joseph D. Young, Capt, USAF 

3345th Technical School 



Approved for public release; distriburion unlimited. 




TECHNICAL TRAINING DIVISION 
AIR FORCE HUMAN RESOURCES LABORATORY 
AIR FORCE SYSTEMS COMMAND 
Lowry Air Force Base, Colorado 



3 



I 



FOREWORD 



This research represents a portion of the exploratoi^' development 
t^roEram of the Technical Training Division, Air Force Human Resources ^ ^ 
LbLatoiy. The work was documented under Project 1121, Technical 
ItevelopmSt; Task 112103, Evaluating Individml Proficiency 
Training Programs, and was completed during the period July 1970 through 
June 1971. Dr. Marty R. Rockway was the Project Scientist and Capt Wajnie 
S LltaM wS the iLk Scientist. The study was perfomed In cooperation 
BWth Technical School, Ohanute AFB, Illinois. The series of 
Educational Testing Service, Princeton, New Jersey were obt^ned md,r 
Contract Flil609-7O-C-Ooili of which Dr. Robert F. Boldt and Dr. Ga^ J. 
SM^^chrse^d as co-principal investigators. Cap t Wayne S. Selhnan 
was the Air Force technical monitor and Capt Joseph D. Toung was the 
Chanute AFB project officer. 

Included among the many individuals who contributed to the accomplish- 
ment of this study were Major B. J. Dunnington, MSgt J. R. Fitzpatrick, and 
Mr. J. E. Ross, Jet Engine Branch, Aircraft Maintenance framing ^ 

and SMSgt h: L. McKellip, MSgt R. E. Cosner, and Mr. C. H. Ervin, Aerospace 
Ground Equipment Branch, Department of Weapon Systems Support Trainingj 
33il^th Technical School, Chanute AFB, Illinois. 



This report has been reviewed and is approved. 



GEORGE K. PATTERSON, Colonel, USAF 
Commander 



O 

ERIC 



4 



ii 



ABSTRACT 



This report describes a study to determine the feasibility and the 
cost-effectiveness of using confidence testing as a diagnostic aid in 
technical training programs. Two types of confidence testing, Pick-One 
and Distribute 100 Points, were developed for comparison to conventional 
multiple-choice testing. The study was carried out in two technical training 
courses. Aerospace Ground Equipment Repairman (AGE) and Jet Engine Mechanic 
(JEM), currently being taught at Chanute Air Force Base, Illinois. The 
criteria for feasibility included end of block examination grades, number of 
student remediational sessions, and both student and instructor attitudes. 

In addition, the relationship of various personality variables to confidence 
test scores was examined for both types of confidence testing. The major 
finding was that while scoring was somewhat more time consuming, end of block 
examination grades in^jroved slightly and the number of remediations required 
declined slightly when either confidence testing method was employed. Other 
areas of investigation produced essentially null results. 



SUMMARY 



Echternacht, G. J., Sellman, W. S., Boldt, R. F., and Young, J. D. An 
evaluation of the feasibHitv of confidence testing as a diagnostic aid 
in technical training . AFHRL-TR-71-33 • Lowry AFB, Colo: Technical Training 

Division, Air Force Human Resources Laboratory, July 1971. 



Problem 



The purposes of this study were; (1) to determine the feasibility 
of using confidence testing, where the student responds in terms of his 
degree of confidence in item alternatives, as a diagnostic evaluative aid 
to instructors in Air Force technical training courses^ and (2) to 
determine the cost-effectiveness of confidence testing versus the conventional 
multiple-choice testing now used in Air Force technical training courses. 

Approach 

Two experimental forms of confidence testing, termed Pick-One and 
Distribute 100 Points, were developed for use in the experiment. These 
experimental forms of testing were used by students in two different 
courses as was traditional multiple-choice testing. In addition, a special 
type of student remediation was developed and used with each type of testing 
as was the standard remediation procedure. 

The various types of testing and remediation were used with daily quizzes 
administered as diagnostic aids. Criterion data consisted of end of block 
examination scores and the number of remediations required of each student. 
Both students and instructors were also asked to indicate their attitude 
toward the confidence methods. Records were kept indicating the length of 
time required to score the confidence tests and the time required for their 
administration. Personality tests which might be related to any tendency to 
mark confidence in a manner unrelated to achievement were also administered. 

Results 



In the analysis of the end of block examination scores, significant 
interactions were found which did not allow the interpretation of overall 
differences between the types of testing. The effectiveness of the types of 
testing 'Varied with the different training shifts involved in the experiment. 
In general, the group using multiple-choice testing obtained lower end of 
block examination grades than did either group using confidence testing though 
the size of this effect varied from shift to shift. Students in the group 
using multiple-choice testing also required more remediations on the average 
than students using confidence testing. 



It was found that confidence testing required slightly more administration 
time than did multiple-choice and that it required about twice as much time to 
score. Students were found to be only slightly favorable to confidence testing 
while instructors tended to indicate that it was more precise than their needs 
required and disliked the required increase in scoring time. In contrast to 
some studies, no personality variables were found to be substantially corre- 
lated with the confidence test score when differences in the number right were 
controlled. 



Conclusions 



Two experimental methods of confidence testing were developed for use 
with diagnostic tests administered in technical training courses. These 
methods resulted in improved end of block exeuuination scores in instances 
where differences in the types of testing were found to be significant and 
in fewer average remediations. No significant personality variables were 
found to be substantially related to the process of allocating one's con- 
fidence. On the negative side, the time required to administer and score the 
quizzes increased, especially the scoring time, and the instructors objected 
to this increased scoring time. 

This summary was prepared by Wayne S. Sellman, Technical Training 
Division, Air Force Human Resoiirces Laboratory. 



Vi 



7 



TABLE OF CONTENTS 



SECTION 


I. 

TT 








XX • 

TTT 






OJjiUX 

SECTION 


XXX • 

IV. 







Type of Testing and End of Block Scores 

Type of Testing and Number of Student 
Remediations 



SECTION 


V 


SECTION 


VI 


liEFERENCES 




APPENDIX 


I 


APPENDIX 


II, 


APPENDIX 


III, 


APPENDIX 


IV 


APPENDIX 


V, 


APPENDIX 


VI, 


APPENDIX 


VII, 



Student Attitudes 

Items Common to Each Type of Testing 

Items Common to the Two Types of 
Confidence Testing 

Instructor Questionnaires 

Personality Variables as Related to 
Confidence Testing 

Quiz Administrative Time and Scoring ,,, 

Conclusions 

Re c ommenda t ions 



Procedures and Scoring 

Personality Tests 

Types of Testing Formats 

Student Attitudes 

Detailed Analysis of Block Scores 

Frequency Distributions of Re^onses to 
Student Attitude Questionnaires 

Instructor Questionnaires 



Page 

1 

k 

$ 

11 

11 

16 

18 

19 

19 

20 

21 

32 

3k 

36 

37 
39 
$6 
80 
96 

106 

115 

12ii 



8 

vil 



/ 



TABLES AND FIGURES 



Page 



Figure 1. Schedule for Aerospace Ground Equipment Repairman 7 

Figure 2. Schedule for Jet Engine Mechanic 8 



Table 1. Correlations of the End of Block Examination Scores in 



[ AGE with Standard Deviations on the Diagonal 12 

Table 2. Effects (group mean - grand mean) of the Types of 
I Testing in Block 7 of Shift B in AGE 13 

Table 3. Effects (group mean - grand mean) of the Types of 

Testing in Block 6 of Shift C in AGE 13 

Table ii. Effects (group mean - grand mean) of the Types of 
Testing in Block 2 of Shift A Using Special 
Remediation in JEM 

Table Effects (group mean - grand mean) of the Types of 
Testing in Shift A Using Control Remediation 
in JEM 1^ 

Table 6. Effects (group mean - grand mean) of the Types of 
Testing in Shji^t B Using Special Remediation 
in JEM 

Table 7* Average Number of Remediations Per Student in AGE 17 

Table 8. Average Number of Remediations Per ShiXt in JEM l8 

Table 9. Correlation Matrix for Personality Variables, Rights- 



Only, and Pick-One Confidence Test Score on Quiz 1 2k 



Table 10. Correlation Matrix for Personality Variables and Pick-One 
Confidence Test Score with Right s-Only Test Score 
Partialled on Quiz 1 2$ 

Table 11. Correlation Ifetrix for Personality Variables, Rights-Only, 

and Pick-One Confidence Test Score on Quiz 2 26 

Table 12. Correlation Matrix for Personality Variables and Pick-One 
Confidence Test Score with Rights-Only Test Score 
Partialled on Quiz 2 27 

Table 13. Correlation Matrix for Personality Variables, Rights-Only, 
and Distribute 100 Points Confidence Test Score 
on Quiz 1 28 



O 

ERIC 



vill 



9 






TABLES AND FIGURES (Continued) 



Page 



Table 1^. Correlation Matrijc for Personality Variables and 

Distribute 100 Points Confidence Test Score with 
Rights-Only Test Score Partialled on Quiz 1 29 

Table l5 . Correlation Matrix for Personality Variables, 

Rights-Only, and Distribute 100 Points Confidence 

Test Score on Quiz 2 30 

Table I 6 . Correlation Matrix for Personality Variables and 

Distribute 100 Points Confidence Test Score with 
Rights-Only Test Score Partialled on Quiz 2 31 

Table 17 • Frequency of Testing and Scoring Times Reported by 

Institutions for Daily Quizzes Using Confidence and 
Multiple- Choice Testing 33 



11 



10 



lx 



SECTION I 



Introduction 



I 





One of the primary tasks that faces instructors in technical training 
situations is that of accurately assessing student knowle^e of course 
materials. Often daily quizzes' are administered as "diagnostic" aids to 
identify areas of instruction in 'which students are strong or weak. Al- 
though the recent past has seen the development of niunerous devices and 
techniques for improving instruction, little has been done to improve the 
methods of measuring student achievement and diagnosing their strengths 
and weaknesses. 

One of the most popular methods of testing student achievement is 
through the use of multiple-choice test items where "the examinee is pi*e- 
sented a question anid a number of alternatives from which he is to choose 
the correct answer. However, the notion of requiring an examinee to choose 
only one alternative from a fixed number has been subject to criticism. For 
many items the examinee is quite sure as to the correct choice and has no 
difficulty indicating it; on the other hand, he may be able to eliminate same 
of the alternatives and then be forced to guess between the rest. Knowledge 
is not an all-or-none proposition. It seems reasonable to assume that a 
student who can eliminate some alternatives has more knowledge or insight than 
one who can eliminate none, and a student idio selects an answer and indicates 
his doubt as to its correctness has. more knowledge or insight than one who is 
con^jletely misinformed and yet certain of his answer. 

One possible approach for providing diagnostic information to instructors 
is confidence testing. Confidence testing attempts to provide a means of 
deteimining a student's degree of confidence in his response to various tests 
and perfbmance situations. How should an examinee indicate his degree of 
confidence when choosing responses in the face of 'uncertainty? Possible 
solutions to this problem of method of response and the corresponding scoring 
system have been appearing in the literature since the mid 1930 's and- more 
recently have been associated with the names of de Finetti (196^), Coombs, 
Milholland and Womer (19^6), and Sh'uford, Albert, and Massengill (1966). A 
coit^ilete review of the literature can be found in Echternacht (1971) • 

In confidence testing two assumptions are usually made: the examinee 

must be interested in obtaining a high score, and the scoring rule must be 
known to the examinee. When an examinee using confidence testing encounters 
an item for which he is uncertain of the correct response , his answer should 
reflect his degree of belief (i.e., his subjective probability) about -the 
correctness of the various al'bernatives . This can be acconqplished in a 
number of wsqts. Although not based on subjective probability, Coombs, 
Milholland and Womer utilize a’ response method where the examinee crosses out 
all al'ternatives he believes to be false. Another system not based upon. 



]1 



- 1 - 



subjective probability, developed by Ebel (196^), utilizes a five-choice 
I true-false format. The subjective probability approach advocated by 

de Finetti requires the examinee to allocate five stars or points over 
the alternatives present in such a way as to reflect his degree of belief 
[ in the alternatives he believes possible. In another noteworthy subjec- 

tive probability measure, developed Shuford and Massengill (1969) the 
i examinee uses a device teimed SCoRule to show his degree of belief in each 

alternative. 

In the scoring for the approaches not based on subjective probability, 

? a somewhat arbitrary method is used, such as obtaining a certain score for 

each incorrect alternative crossed out and a’ certain penalty score for -each 

0 correct alternative eliminated. The subjective probability approach 

i; utilizes a concept termed reproducibility in the development of scoring 

1 systems. Basically, a scoring system is termed reproducible if an examinee 

! can only maximize his expected score with respect to his state of knowledge 

only by re^onding with his true subjective probabilities for each alternative, 
i Shuford and Massengill use a logarithmic scoring function to this end, while 

de Finetti relies on an approximation to what he terms the continuous method. 

I 

I Advocates of confidence testing believe that their procedures provide 

more information and yield "fairer" scores than conventional multiple-choice 
; testing since measures of the level of student knowledge of each test item 

f are acquired rather than a simple • indication that the student was right or 

I wrong. Instructors could thus identify the level of student knowledge and 

consequently, more accurately ascertain how and what additional teaching 
I should occur. 

V 

I 

I I^ in fact, confidence testing does provide information concerning a 

f student's level of knowledge beyond that provided by conventional multiple- 

I choice tests, it would appear that its use in technical training courses 

1 would allow instructors to tailor course presentations to correct student 

I weaknesses and make materials more meaningful to students, thus enhancing 

the training program. 

For the purpose of this stucty feasibility was defined in terms of student 
} course performance, student remediations, and student and instructor attitudes 

toward the applicability and practicality of confidence testing in the setting 
of technical training in the Air Force. Thus, confidence testing would be 
deemed feasible if students subjected to confidence testing in their diagnostic 
daily quizzes performed better and required fewer remediations in courses than 
j students not so ejq)osed. Confidence testing would also be considered feasible 

if students and instructors found the practice to be useful and not too time 
consuming . 

One factor influencing the feasibility of using confidence testing in 
technical training was the relationship between confidence test scores and 
various personality variables, Swineford (1938) first demonstrated a 
relationship between eariy methods of confidence testing and examinee 
personality when she derived a gambling score for each examinee which was 



ERIC 



2 - 



12 



■uncorrelated with the total test score. Thus, she concluded a confidence 
test score was conqjrised of two parts, one for achievement, the other for 
willingness to gamble. This study attempted to reevaluate this relation- 
ship using modern methods of confidence testing after the subjects had 
practiced with the methods. 



SECTION II 



San^)le 

The setting for this study was the 33li5th Technical School at Chanute 
Air Force Base, Illinois. Two courses,- Aerospace Ground Equipment Repair- 
man (age) and Jet Engine Mechanic (JEM), were chosen from, the various courses 
available for participation due to the high flow of students entering these 
courses each week. Upon course entry students were assigned to a six-hour 
instructional shift in a random fashion. The AGE course was divided into four 
nonoverlappuig shifts, while the JIM course utilized only two shifts. These 
shifts were designated "A, B, C, and D|' in AGE and "A and B" in JEM. The 
instructional time of Shift A was from' 0600 hours until 1200 hours; Shift B 
from 1200 hoTirs until 1800 hours; Shift C from 1800 hours until 2l;00 hours; 

Shift D from 2l;00 hours until 0600 hours. Students entering the JIM course 
were further assigned to different instructors within their shift. Both of 
these courses were organized into a number of instructional blocks that were 
of either a one- or two-week time period. 

Since the experimenters were primarily interested in confidence testing 
as applied to a multiple choice foimat, the daily quizzes used in each course 
were examined to determine a period where most daily quizzes given were 
multiple choice in nature. After all daily quizzes were examinee^ blocks two 
and three were selected for further stuc^ from the JEM course^ while Blocks 
six, seven, and eight were selected from the AGE course. 

All students entering these phases of the courses between October IJ4, 1970 
and November I8, 1970 were selected as subjects in the experiment. These 
students, who served as subjects in the experiment, were primarily young men 
havi^ recently enlisted in the Air Force. Data were collected for li3h students, 
180 in AGE and 254 in JIM. The average Airman Qiialifying Examination (AQE) 
percentile ranks were approximately 70 for those students in AGE and 60 for 
those students in JIM. Further details regarding the two courses under study 
can be found in the Plan of Instruction for Jet Engine Mechanic and Aerospace 
Ground Equipment Repairmen (Air Training Command, 1970, Ca) (b)). 



SECTION III 



Design 



The effects of threo different methods of daily quiz testing on coiirse 
performance as measured by end of block examination scores were under stucfy 
in this ejqjeriment. In addition, the effects of two types of remedial treat- 
ment and the interactions of the remediation type with testing type were of 
interest. 



Of the three methods of testing under study, two were experimental con- 
ridence procedures while the third was a control procedure. The control 

procedure consisted of traditional multiple-choice testing with four alterna- 
tive response items. 

One confidence testing procedure, termed "Pick-One", required the examinee 
to first choose the alternative he believed to be correct, exactly as he would 
in a conventional multiple— choice test, and then indicate on a five-point 
scale his sureness in his response. This scale ranged from "veiy sure", 
^dicating complete certainty on one end, to "not sure", indicating conqDlete 
Ignorance on the other end. The points on the scale were designed to corre- 
spond to various subjective probability levels for the chosen alternative. 

A scoring scheme was devised that was reproducible as far as the probability 
of the response chosen was concerned, though in the present experiment con- 
fidence was rated and the reproducibility property approximated. A complete 
description of this technique was given by Boldt ( 1971 ). The Pick-One 
confidence testing method was devised for both examinee and test administrator 
ease. It was felt that this method was the least demanding on both the 
responder and the scorer. Scoring was simple as there were' only nine possible 

scores for an item and test administrators could remember these scores after 
a little practice. 



A second type of confidence testing used in this study, termed "Distribute 
100 Points", approximated the method devised by Shuford and Massengill (I 969 ). 
Using this method, the examinee was first required to choose an alternative 
and record that as being his selected answer. He then indicated his subjec- 
tive probability of each alternative's being correct ly distributing 100 points 
over the various alternatives. A truncated logarithmic scoring function vans 
used. This method differed from that devised by Shuford and Massengill only 
Dn that the examinees were asked to respond directly with their subjective 
probability rather than use a response device such as the SCoRule. Illustra- 
tions using both the Pick-One and Distribute 100 Points methods appear in 
Appendix HI, 



Two tj^es of remediation were used in this study. A student was assigned 
to a remedial session of two hours following his scheduled class if he per- 
fomed unsatisfactori^ on the daily quiz (usually scoring below 70 percent), 
had poor performance in the previous block, showed weakness in practical 
performance, or missed class time due to sickness or leave. In each case the 
assignment of a student to a remediation session was left to the discretion 



oi* the individual instructor. One type of remediation was the standard or 
control remediation procedure in use at the technical school. A special re- 
mediation was devised as an alternative method. This method was based on the 
notion that students responding incorrectly with high confidence should receive 
a different type of instruction than students responding incorrectly with low 
confidence. Students who were misinformed (wrong answer with high confidence) 
would go through a two stage remedial process, first being instructed as to 
why their responses were wrong and then why the correct answer was, in fact, 
correct. Students who were simply not informed (wrong answer with low con- 
fidence) would go through only a single stage remedial process, being ^st^cted 
as to why the correct answer wa^ in fact, correct. In this manner, an initial 
step could be taken to allow instructors to tailor their remedial instruction 
to the needs of the students. Additional discussion of the remedial procedures 
can be found in Appendix I . 

The two factors, method of testing and method of remediation, when taken 
in combination, produced six treatment combinations. These six treatment 
combinations were then as' igned in a random order within each instructional 
shift to six classes as they entered the appropriate blocks under study. 

Once a particular class entered the experiment and was assigned a particular 
method of testing and remediation, it continued use of only that combination 
unti3. it concluded its part of the experiment. In the JEM course where an 
entering class was subdivided and assigned to varicxis instructors, everyone 
in that entering class received the same treatment ccmbination regardless 
of his instructor, and continued using that treatment combination even 
though the cori5)Osition of subjects assigned to an instructor within a shift 
changed from block to block. 

The scheduling and the assignment of the various treatment combinations 
appears in Figures 1 and 2. The rows represent the weeks of the experiment. 

The columns denoted T., i=l, 2, ..., 6, represent the particular testing 
treatment cembination^used by the class entering the e^qperiment at the jth 
week in one of the shifts. The types of testing were coded as; Multiple 
choice (Mult Ch), Pick-One (Pick-One), and Distribute 100 Points (Dist 100). 
Thus, the class entering Block 6 of AGE in the first week of the experiment 
in Shift A used Distribute 100 Points confidence testing with special 
remediations until they completed Block 8. Similarly, the class entering 
Block 6 of AGE in Shift B during the third week of the experiment used 
Pick-One confidence testing with special remediation. The assignment of the 
treatment combinations to the Tj was accomplished independently for each shift 
in each course using a table of “random numbers. 

Two pieces of data were collected for each student as he completed 
partici]Tation in the experiment: final end of block examination scores, 

three for AGE and two for JEM, and the number times each student was assigned 
to remediation. These records were obtained from the technical school student 
files. 

Since Swineford (1938) was able to derive a gambling score from confidence 
responses orthogonal to the total test score, a secondaiy consideration under- 
taken in this studj'' was a study of how various personality factors affected the 



Fig\are 1 



. Schedule for Aerospace Ground Equipment Repairman 



1 

2 

3 

W k 
E 5 
E 6 
K 7 
8 

9 

10 





T 

2 


T 

3 


^1. 


% 


"6 


B 












1 












k ^ 












c 


B 










k 


1 










6 


Q 










Block 


C 


B 








7 


k 


1 










6_ 


O 








B 


Block 


C 


B 






1 


7 


k 


1 











6 


o 






C 


B 


Block 


c 


B 




k 


1 


7 


k 


1 




i_J 


r» 




6 


n 






C 


B 


Block 


C 


B 




k 


1 


7 


k 


1 




S_ 


n 




6 


o 






C 


B 


Block 


c 






k 


1 


7 


k 






8 


O 




6 








c 


B 


Block 








k 


1 


7 








fl 


o 












C 


B 










k 


1 










R 


n 












c 












k 








1 




a 



o 

ERIC 





Shift A 


Shift B 


Shift C 


Shift D 




Dist 100 
Special 


Mult Ch 
Special 


Pick-One 

Control 


Dist 100 
Control 


■^2 


Dist 100 
Control 


Tp Dist 100 
Control 


Tp Dist 100 
^ Special 


Tp Mult Ch 
^ Special 


"3 


Pick-One 

Special 


Pick-One 

Special 


Mult Ch 
^ Special 


To Pick-One 
^ Control 




Pick-One 

Control 


T, Dist 100 
^ Special 


T, Mult Ch 
^ Control 


Ti Pick-One 
^ Special 




Mult Ch 
Control 


Mult Ch 
^ Control 


Dist 100 
Control 


T^ Mult Ch 
^ Control 


^6 


Mult Ch 
Special 


Pick-One 

Control 


Pick-One 

Special 


T/ Dist 100 
° Special 




I 

j 



i 



FiguT6 2. Schedule for Jet Engine Mechanics 



W 

E 

E 

K 







T 

2 


T 

3 


"i. 


"s 


^6 




B 












1 


1 














c 


B 










2 


k ^ 


1 












2 


o 












B 


C 


B 








3 


.1 


k 


1 










0 


2 


n 










c 


B 


C 


B 






h 


k 


1 


k 


1 








3 


.0 


2 


0 










c 


B 


c 


B 




5 




k 


1 


k 


1 








3 


0 


2 


0 










c 


B 


c 


B 


6 






k 


1 


k 


1 








3 


0 


2 


0 










c 


B 


c 


7 








k 


1 


k 










1. 


0 


2 












c 


B 


8 










k 


1 












2— 


0 














c 


9 












k 














2_ 





^ 1 . 

% 



Shift A 




Shift B 


Pick-One 


*^1 


Mult Ch 


Special 




Control 


Mult Ch 


T 

‘*’9 


Mult Ch 


Special 




Special 


Pick-One 




Pick-One 


Control 


J 


Special 


Dist 100 


\ 


Pick-One 


Control 


4 


Control 


Dist 100 




Dist 100 


Special 


Special 


Mult Ch 


' ’'6 


Dist 100 


Control 




Control 





confidence responses of the subjects taking the confidence tests. As of 
now, though, few studies have been made of the effects of personality on 
modern confidence response procedures. In order to examine the relation- 
ship between personality factors and confidence re^onses, a battery of 
personality tests was developed and administered to each class as it entered 
the ej^eriment. This batteiy ^eluded the Clayton and Jackson F-scale (I96I), 
^keach s Dogmatism scale ( 19 ^ 6 ), the Alpert-Haber facilitating and debili- 
teting test anxiety scales (i960), the Gough-Sanford rigidity scale ( 1957 ), 
Barra tts» Ii^ulsiveness scale ( 1959 )^ and a self-sufficiency test developed 
previously hy Educational Testing Service. In addition, modifications of 
Kogan and Wallach's (196i|) risk-taking tests were used yielding five scores 
lor five different gaming strategies. These tests may be found in Appendix II. 

Each uistructor was asked to return to the experimenters the last three 
naatiple-choice daily quizzes for the instructional block he taught. For 
passes subjected to confidence testing, tests were scored using both the 
an^rd ri^ts only scoring method and the appropriate scoring method for 
confidence testing, in addition, an atteii5)t was made to obtain AQE General 
test scores as a measure of verbal ability. However, many of these test 
scores could not be found in either the technical school records for the 
two courses ^der stuejy or from the personnel files of the Air Force Human 
Resources Laboratoiy, Personnel Division, Lackland Air Force Base. Therefore, 
this information was not used in the study of the personality variables. 

At the conclusion of a class participation in the experiment, each 

student was given a questionnaire concerning his attitudes toward the course 

testing, in general, that takes place in the Air Force. Students who were 

subjected to confidence testing were also asked about the difficulties they 

e:^erienced, their attitude toward the confidence tests, their response styles, 

Md the method s aid during remediation and review. Some questions were common 

10 all types of testing, some questions were common to both types of . confidence 

nfn were sp^ific to each type of testing. These questionnaires 

can be found in Appendix IV. = ^ 

n 1, analysis, instructors were given a questionnaire which 

sked, among other things, how useful were the confidence reroonses, how much 
tme was required for administration, and what their attitudes were toward 
the testing process in general. Also, instructors were asked to keep a log 
^di^t^g the test administration time, number of answer sheets scored, and 
the test correction time for each type of testing they encountered. From 
this information, scoring time per answer sheet was calculated for confidence 
and multiple choice procedures. This questionnaire can be found in Appendix VII 

^yp°'^^®sized that: (l) individuals subjected to confidence testing 

ctoice tlstiil (2 e^ra^tion scores than subjects using regular m^tiplf- 
. i block examination scores would respond in a similar 

manner to te^^g conditions since there was no reason to^eot that“looks 
^d respond differentially; ( 3 ) subjects using oonfidencTtesti^and hawLg 

^n examin^ion scores 

th^ th^other combinations of testing and remediation type : U) confidence 

testing would require fewer remediations than multiple-choice.^ 



•^ 9-19 



With regard to the intercorrelations of the personality tests, it was 
hypothesized (1) that some imspecified personality variables wooiLd correlate 
significantly with the confidence scores, (2) that when the multiple-choxce 
test variables were partialled out, significant correlations woi^d remain 
between the same unspecified personality variables and the confidence test 
scores, and (3) that the confidence test score would correlate approximately 
one with the multiple-choice test score. 



the standard 'deviations of the respective end of block examination scores 
and wei'e used for the purpose of comparing group means. 



Table 1 

Correlations of the End of Block Examination Scores in 
AGE with Standard Deviations on the Diagonal 



Variable 


Block 6 
Score 


Block 7 
Score 


Block 8 
Score 


Block 6 Score 


6.697 


.399 


.536 


Block 7 Score 




8.993 


.53ii 


Block 8 Score 






7.665 



End of block examination scores were recorded in terms of percent of 
items correct, and the standard deviations were in tems of percentage points. 
The correlations were all significantly nonzero with the correlation between 
Block 8 scores and Blocks 6 and ^ scores being roughly the same. The corre- 
lation between the Block 6 scores and Block 7 scores appeared to be slightly 
lower . 

When the shift by testing type interaction was tested for significance, 
two of the three discriminant functions available were found significant with 
probabilities less than , 0 $. This was interpreted to mean that the testing 
type effect depended upon the particular shift, block, and type of testing 
that was being examined; hence, no overall main effects were examined for AGE. 
In order to better understand this interaction, univariate one-way analyses 
were performed within each shift, with the testing type effects being calcu- 
lated in each case. From this point on, the discussion of the analysis of end 
of block examination scores in AGE will be presented shift by shift. 

The analysis for shift A indicated that there were no significant 
differences between the types of testing. Thus, it was concluded that in 
shift A each of the types of testing on daily quizzes rendered the same 
result. 

In the analysis of testing type within shift B, one significant 
discriminant function was found. An examination of the univariate F-ratios 
indicated that a significant difference occurred only in Block 7* The effects, 
in terms of deviation of means, of the various types of testing are given in 
Thble 2. The group using multiple-choice testing had the lowest average block 
examination score, while Distribute 100 Points had the highest. The difference 
between the multiple-choice mean and the Distribute 100 Points mean was 11,375, 
which was slightly more than one standard deviation and was considered to be 
large. No significant difference between the types of testing was foxind for 
the remaining blocks. 



SECTION IV 



Results 



As stated earlier, the various treatment combinations were assigned at 
random to classes as they entered the experiment* Due to the large number 
of instructors involved in the experiment in relation to the number of 
students, it was necessary to confound the instructor variable. A description 
of each of the univariate analyses used in this section can be found in 
Fisher (1958). The following sections report the results of the analyses of 
(l) the end of block examination scores^, (2) the number of remedial sessions^ 
( 3 ) student attitudes^ (i|) instructor attitudes, (5) personality variables^ 
and, (6) the daily quiz administration and scoring time. This process was 
carried out identically for every shift in both AGE and JM. 

Type of Testing and End of Block Scores 

It was assumed that students were randomly assigned to their respective 
shifts allowing a three-way factorial analysis with the independent variables. 
The three factors used as independent variables were type of testing, type of 
remediation, and shift. The dependent variables used in this analysis were 
the respective end of block examination scores earned while in the experiment. 
Since these dependent variables were correlated, a multivariate analysis of 
variance was used. The data and detailed analyses, including the univariate 
F-ratios, appear in Appendix V.. Detailed expositions of multivariate 
analyses of variance can be found in Rao (1952), Morrison (196?), and Pruzek 



In the AGE course three end of block examination scores served as criteria 
in the analysis. Under normal circumstances a three-way design incorporating 
tjiree types of testing, two remediation types, and four shifts would utilize 
2h cells in a factorial design. However, in the present case only 19 cells 
contained data, and two of these contained very little data. These missing 
cells resulted frcm some administrative confusion by the instructors involved 
with respect to the data collection system. In order to overcome this missing 
data problem, the factor of remediation type was deleted as a main effect and 
relegated to a nesting variable in a two-way design. This approach was deemed 
feasible since the instructors had indicated, informally, that the special 
remediation procedures were infrequently used. Since the procedures were used 
infrequently, it was assumed that the remediation type effect was negligible. 
Thus, a two-way factorial layout was conceptualized, with type of testing and 
shift serving as factors and type of remediation used to nest classes within 
the various treatment cOTibinations. VIhen this adjustment was perfonned, no 
en 5 )ty cells remained.. 

As a part of the analysis, the correlations among the end of block 
exami^tion scores within the error term of the design were calculated. These 
estimates appear in Table 1 and were the proper within-cell estimates of the 
population correlation coefficients. The values on the diagonal represent 



Table 2 



Effects (group mean - grand mean) of the Types 
of Testing in Block 7 of Shift B in AGE 



lype of Testing 



Effect 



Mul t iple - Ch oi ce 
Pick-One 

Distribute 100 Points 



-6.667 

1 . 9^8 

li.708^ 



When shift C was analyzed, one discriminant function was found to be 
significant. Univariate analyses on the Block 6, 7^ and 8 scores produced 
significant F-ratios on only the Block 6 scores. Table 3, indicates that 
multiple-choice testing was definitely the least effective method in this 



block, while Distribute 100 Points was the most effective method. The 
difference between the means for multiple choice and Distribute 100 Points 
was 10.187, which is about one and one-half standard deviations. Such a 
difference was considered very large eind substantial. Since no significant 
differences were found in Blocks 7 and 8,' the testing types were concluded to 
be equally effective in these blocks. 

No significant differences between testing types were found in shift D. 
Thus there was no particiilar. advantage in using any of the types of testina 
on the daily quizzes in this shift. 

In summary, of the four shifts and three blocks, twelve analyses in all, 
only two analyses resulted in significant types of testing differences. In 



Table 3 

Effects (group mean - grand mean) of the l^es 
of Testing in Block 6 of Shift C in AGE 



Type of Testing 



Effect 



Multiple-Choice 

Pick-One 

Distribute 100 Points 



.521 

li.833 




both of these cases multiple-choice testing had the lowest mean block examina- 
tion score while Distribute 100 Points had the highest mean block examination 
score . 

In the JEM course two variables served as criteria in a multivariate 
analysis of variance. These two variables were end of block examination 
scores received 'vdiile in instructional Blocks 2 and 3* The design called 
for a three-way factorial with type of testing, type of remediation, and 
shift as factors. There were three types of testing as before, two types 
of remediation, and two shifts, designated A and B. An initial elimination 
of the data showed that all 12 cells of the design contained (^ta. The number 
of observations was not uniform, as two cells had 6 and 8 subjects, while the 
ratiaining cells contained between 1^ and 32 observations. 

The correlation between Block 2 and Block 3 examination scores was 
estimated to be ,$hl , which was slightily higher 

in AGE. The estimated standard deviations were 6. 883 and o.O^o for Blocks d 
and 3 respectively. 

When the testing type by remediatipn type biy shift interaction was 
tested, one discriminant function was found to be significant. In order to 
better understand this interaction, the analysis was divided so that the types 
of testing could be examined within the four combinations of shift and remedia- 
tion type. From this point on the analysis will be discussed by these four 
groups . 

One significant discriminant function was found when the types of testing 
were considered in Shift A for classes using special remediation. An examina- 
tion of the univariate F~ratios yielded significant differences in both blocks. 
As this was an unusual finding in multivariate analysis of variance, an 
examination of additional statistics was \indertaken in order to interpret this 
result. The correlations between the discriminant variable, the appropriate 
linear combination of block scores, and the block scores were examined. The 
correlation between the discriminant variable and Block 2 scores was found to 
be .997 indicating that the two were identical for all practical purposes. The 
correlation between the discriminant variable and Block 3 scores was found to 
be .61, which was low since this correlation could not be less than .$k7 y t-he 
correlation between the two block scores. Thus, it appeared that the corre~ 
lation between the block scores resulted in the univariate estimate of the 
effect of the type of testing that was found in Block 3- 

The effects of the various types of testing are given ^ Table U. It 
was apparent that multiple— choice testing was again low, while Distribute 100 
Points testing was about two-thirds of a standard deviation higher. This 
difference was notable but not as large as those previously reported. Pick-One 
testing, on the other hand, differed greatly from multiple-choice testing in 
that the difference was about two standard deviations. 




Table h 

Effects (group mean - grand mean) of the 
Types of' Testing in Block 2 of ShiXt A 
Using Special Remediation in JEM 



Type of Testing 


Effect 


Mult iple-Choice 


- 5.618 


Pick-One 


6.9h2 


Distribute 100 Points 


-1.324 



Two significant discriminant functions were found when the types of 
testing were examined within Shift A when control remediation was used. Also, 
the univariate F- ratios were significant for each instructional block. Thus 
it was concluded that there were significant testing type effects in each block. 
These effects are presented in Table In Block 2 multiple-choice testing was 
again low, x^th Distribute 100 Points being about three-fourths of a standard 
deviation higher and Pick-One being about one standard deviation higher. In 
Block 3f however, both multiple choice and Pick-One were low while Distribute 
100 Points was approximately one standard deviation higher than multiple choice. 



Table $ 



Effects (group mean - grand mean) of the Types of 
Testing in Slnft A Using Control Remediation in JEM 



Type of Testing 


Block 2 


Block 3 


Mul t ipl e - Ch 0 i ce 


- 4.028 


- 2.225 


Pick-One 


2.741 


- 1.032 


Distribute 100 Points 


1.287 


3.257 



When the ty^s of testing were analyzed in Shift B for classes, using 
special remediation, two discriminant functions were found to be significant. 
When the univariate F-ratios were examined, significant F's were found in both 
blocks. The estimates of the treatment effects are given in Table 6. In 
each case the mean block grade was lowest for the group using multiple choice 
tesuing,^ Table 6 indicates that the difference between the means for the 
group using multiple— choice testing and either confidence procedure was about 
one standard deviation. There seemed little to choose between the Pick-One 
and Distribute 100 Points methods in Block 2, while the Pick-One method appeared 
to be superior to the Distribute 100 Points method in Block 3. The difference 




between the multiple-choice and Distribute 100 Points methods was about one 
standard deviation while the difference between the multiple-choice and 
Pick-One methods was about two standard deviations and therefore quite 
outstanding. 



Table 6 

Effects (group mean - grand mean) of the. Types of 
Testing in Shift B Using Special Remediation in JM 



Type of Testing 


Block 2 


Block 3 


Multiple- Ch oice 




- 5.^61 


Pick-One 

) 


2.191 


$.h66 


1 

Distribute 100 Points 


2.373 


.09$ 



No significant testing type differences were found in Shift B when the 
control remediation type was used. Thus, it was concluded that there was no 
difference in the block scores for the groups using the three types of testing 
in Shift B when the control remediation was used. 

In sTimmarizing the results of the analysis of the end of block scores | 

for the various types of testing used in this e 3 q)erlment, one conclusion stands 
out. Multiple-choice testing consistently resulted in the lowest block scores 
when con 5 >ared to Pick-One and Distribute 100 Points . confidence testing. There 
was seme question idiether Pick-One or Distribute 100 Points waS’ superior as 
that seemed to depend upon the particular shift and type of remediation. The 
analyses did seem to favor the Pick-One for the JEM course as that type of 
testing appeared to be more often superior to Distribute 100 Points over all 
shifts and remediation types where significance was found. 

Type of Testing and Number of Student Remediations 

Another criterion for feasibility was the number of remediations required 
for each student using the various types of testing imder study. If confidence 
testing could reduce the number of remediatipns required, it would be beneficial 
to technical training. Therefore, the nmber of remediations each student 
required was recorded for each block in each course used in the eaqjeriment. 

The average number of remediations per student was calculated for each 
group defined by type of testing, shift, and block of instruction. Using 
these data, there appeared to be no appropriate statistical test for assessing 
the significance of any type of testing differences since there appeared to 
be no error tenii. The individual student data could not be used as a dependent 
variable in an analysis of variance since the distribution of these data were 
Poisson rather than normal. Therefore, inteipretation was based upon the 
consistency of the rankings of testing methods with respect to average number 



of reraediations within block and shift. In AGE there were 12 such block- 
shift combinations, while in JJM there were four. The average number of 
reraediations per student are presented in Table 7 for AGE and in Table 8 
for JEM. 

in the Age course, students who used multiple -choice testing required 
more reraediations on the average than did students using either of the 
confidence testing procedures in 9 of the 12 shift-block ccmbinations . The 
difference between the two confidence methods was slightj the two methods 
were the same with respect to the number of times they were ranked lowest 
in the average number of remediations. Thus, in AGE, confidence testing 
appeared to result in a reduction of the total number of reraediations re- 
quired. The differences between Pick-One and Distribute 100 Points appeared 
to be slight. 



Table 7 

Average Number of Remediations Per Student in AGE 





Shift 


Shift 


. Shift 


Shift 


Block 6 


A 


B 


C 


D 


Multiple - Choice 


1.1* 


2 .$ 


2.3 


1.1 


Pick-One 


.1* 


1.5 


.5 


1.1 


Distribute 100 Points 


.1 


1.5 


.3 


.1.1 


Block 7 


Multiple-Choice 


.7 


0.0 


.8 


1.1* 


Pick-One 


.6 


1.5 


.2 


.9 


Distribute 100 Points 


.1 


1.6 


.3 


,2 


Block 8 


Multiple-Choice 


1.1 


1.9 


1.8 


.5 


Pick-One 


0.0 


.6 


1.0 


1.0 


Distribute 100 Points 




.9- 


1.0 


.3 



In JEM similar results were,fo\ind. Table 8 shows that students subjected 
to multiple -choice testing required more remediatibns than either confidence 
method in three out of four shift block combinations. As in AGE, there seemed 
to be little difference in the Pick-One and Distribute 100 Points co.nfidence 
testing with respect to the average number of remediations. 

Thus, it was concluded that either Pick-One or Distribute 100 Points 
reduced the number of remediations required when conqjared with multiple-choice 
testing. It should be noted that this conclusion was based on judgment rather 
than objective testing and therefore sho\ild be taken with caution. 



Table 8 

Averege Nuinber of Reitiediatiions Per Shift in JEM 





Shift 


Shift 




A 


B 


Block 2 


Multiple-Cho ic e 


1.0 


.8 


Pick-One 




• h 


Distribute 100 Points 


.8 


.3 


Block 3 


Multiple -Ch 0 ic e 


2.2 


.7 


Pick-One 


.1 


.7 


Distribute 100 Points 


.1 


.6 



Student Attitudes 

A third feasibility criterion was that of student attitudes towards 
confidence testing. If students favored one of the confidence methods over 
the multiple-choice method to a significant degree, the process would be 
considered feasible even though there were no real significant pins in^ 
student achievement. At the conclusion of the subjects' participation in 
the experiment, each subject was administered an attitude questionnaire that 
asked him about the testing he had encountered thus far in the Air Force. 

In addition, subjects using one of the confidence testing methods in the 
experiment were asked about their testing behavior irtien using it as well as 
their evaluation of the process. 

Subjects were asked to respond to the questions on a five-point scale^ 
where two categories represented positive responses to the item, one a neutral 
response, and two categories a negative response. The student attitude question- 
naire is shown in Appendix IV. Thus, although the responses were based on a 
five-point scale, the responses could be classified into three categories in 
order to compensate for low frequencies in response categories. Two-way con- 
tingency tables were constructed for each item in the questionnaire, with type 
of testing and response category serving as the two classification varia les. 

The resulting data were analyzed by the use of the chi-square statistic. First, 
all five categories of response were used, but when an expected cell frequency 
was less than five, the number of response categories was reduced to three by 
pooling the two positive response categories and the two negative response 
categories. Where an expected cell frequency remained less than five, the 
response classification corresponding to that cell was deleted. 

The results are divided into two parts. The first presentation covers 
the attitude items answered hy all subjects in the experiment. The second 
presentation is for the items answered by those students taking either Pick-One 

28 



- 18 - 



or Distribute 100 Points confidence testing. The complete frequencies of 
re^onse are given in Appendix VI with the items where significant dif- 
ferences were found noted. 



Items common to each type of testing . In AGE three items out of six 
items were found to have significantly different response patterns for the 
three types of testing. Students using multiple-choice testing indicated; 

(1) more satisfaction with the testing they had been using, and (2) that their 
testing more satisfactorily demonstrated what they really knew. Students 
using Distribute 100 Points testing, on the other hand, e:q)ressed less 
satisfaction than under null conditions. When asked how satisfied they were 
with the testing that takes place in their classroom, the group using 
Distribute 100 Points testing indicated more satisfaction than e^qiected and 
the multiple-choice group expressed less satisfaction. In each case the 
group using Pick-One testing answered with frequencies close to those 
expected under the null hypothesis. No significant chi-squares were found 
with questions pertaining to the advantages of classroom testing, satisfaction 
with testing as an aid to remediation, and test results as an aid to remediation. 



A different pattern characterized the JEM course. Significance was 
found in two of six items, only one of which was f ound to be significant in 
AGE. These items dealt with one's satisfaction with the classroom testing 
used and the advantages of classroom testing. In both cases the group using 
Pick-One testing indicated a more favorable attitude toward the testing they 
had used than e3q)ected under the null hypothesis. 

When the results for these two courses are put together, there seemed to 
be little basis for recommending any procedure over another. In JEM the 
Pick-One type of testing seemed to be favored, but in AGE no such preference 
can be seen as there was no clear-cut method standing out in AGE. Since no 
one method of testing appeared to be highly regarded in both courses, no 
clear-cut conclusions can be made. 

One variable confounded in the analysis was that of the students' 
instructors. It was not known whether students were reacting to the type 
of testing they had been using or whether they were responding to their 
instructors' teaching of their classes. 



Items common to the two types of confidence testing . Twenty-five questions 
were asked of students using the two types of confidence testing. These 
questions dealt with how the students responded in the face of uncertainty, 
their ease in marking their answers, and their evaluation of confidence 
testing as compared to multiple-choice testing. 



In the AGE course no significant chi-square values were found, indicating 
that students subjected to the two types of confidence testing under stu(^ 
responded similarly to the items. In general, students subjected to confidence 



- 19 - 29 



testing felt it was important to score high on the daily quizzes, understood 
how the tests were graded (they were not explicitly told), were only fairly 
accurate in marking their confidence, felt comfortable with the procedure, 
and tended to be only neutral or slightly positive in their evaluation of 
confidence testing as conpared to multiple -choice testing. Further, students 
acknowledged that confidence testing required greater thought before answering. 
Students using Distribute 100 Points testing expressed little difficulty in 
distributing their points in such a way that they summed to 100. 

While no significant chi-squares were found in AGE, five items were found, 
to have significant chi-squares in JEM. Students using Pick-One testing in- 
dicated they imderstood how the tests were graded and felt more comfortable 
with the procedure to a greater extent than students\ using Distribute 100 
Points testing. Students using Distribute 100 Points testing believed that 
their testing identified a useful level of knowledge, required more thought 
before responding, and was a useful device for relearning material to a 
greater degree than did the students using Pick-One testing. 

As in the AGE course, students in JEM using confidence testing indicated 
they thought it was important to score high on the daily quizzes^ felt they 
understood the ways of marking their answers very well, were only fairly 
accurate in marking their confidence when uncertain, found it easy to make a 
decision on how to mark their confidence, tended to gamble sometimes in marking 
their confidence, and tended to be favorable to confidence testing as conpared 
to multiple-choice testing. Students using Distribute 100 Points testfuig 
indicated little difficulty in allocating the 100 points in such a way that 
they summed to 100. 

In summary, no one method emerged as favored over both courses. Since 
results were mixed, with each of the types of testing showing some promise in 
various situations, interpretation was difficult. Therefore, the only conclu- 
sion drawn fron the student attitude questionnaire was that no method was 
preferred by the students over any other. 

Instructor Questionnaires 



A questionnaire was given to each instructor who taught a class that 
utilized a confidence testing procedure. This questionnaire contained open- 
ended questions, which were conpleted by 37 instructors. The answers given 
by the instructors were categorized into broad categories the experimenters, 
and specific comments were noted. The questions, along with the frequencies, 
can be found in Appendix VII. 

Typically, an instructor noted only a few students who placed large amounts 
of confidence on wrong alternatives, even though most students responded with 
the highest confidence marks. This resiilt may have come about as most students 
tend to score high on daily quizzes, 70 percent being the passing mark. One 
instructor noted that students usually scoring low tended to have more variation 
in the confidence attributed to the chosen alternative. 




Most instructors were not influenced by the confidence responses and 
did not utilize them in remediation. Instructors vdio did attempt to use the 
confidence scores stated they used them only for identifying the lowest 
ability students, so that they could concentrate on these students. The 
instructors expressed difficulty in scoring, noting specially the lengthy 
time required. 

The instructors felt the students handled the testing situation easily, 
in that there seemed to be sufficient time available and the method for 
assigning confidence was not deemed difficult. One instructor noted that 
difficulty in assigning confidence occurred only at the beginning of the 
experiment, with thej least competent students having the most difficulty. 

Some instructors reported that their- students found the procedure easy 
because they usually assigned the' highest confidence marks to every question. 

One concluding remark was furnished by an instructor who typified the 
instructor's attitudes^ he stated, "It reminds me of using a bulldozer to 
clear snow from a sidewalk — it's too good. Percentages work well enough for 
our tasks . " 

Personality Variables as Related to Confidence Testing 

As previously stated, it was desired to relate certain personality 
variables to confidence test scores. Personality variables of interest were 
dogmatism (DOG), authoritarianism (AUTH), facilitating anxie-^iy (FAS), 
debilitating anxiety (lAS), rigidity (RIGID), impulsiveness (IMP), self- 
sufficiency (PRI), and risk-taking. Five betting strategies — maximum gain (MG), 
minimiim loss (ML), long shot (LS), maximum variance (MB), minimum deviation 
from one-half (HA.LF MD) — were taken as measures of risk-taking. The risk-taking 
measures were modeled after those found in Kogan and Wallach's chance bets 
instrument. Basically, this test consisted of 36 randomly ordered pairs 
representing all possible combinations of three probabilities of winning ( 1 / 3 , 
1/2, 2/3) and three stakes (l 5 ^, 300 , 6 O 0 ). All bets were of zero expected 
value. The five strategy indexes had three different\ bases, two based on a 
monetary amount, two based on probabilities, and one on a combination of money 
and probability. The maximum gain strategy involved choosing that alternative 
with the larger potential winnings, the minimum loss strategy involved choosing 
that alternative with the smaller loss potential, the long shot strategy 
involved choosing that alternative with the lower probability of winning, the 
minimum deviation frcan one-half strategj’’ involved choosing that alternative 
with probability of winning that was closer to one-half, and the maximum 
variance strategy involved choosing that alternative with the greater variance. 
Each of these personality tests were administered to each subject as he entered 
the experimental phase of the course. 

Each instructor was asked to return to the investigators all answer 
sheets for the last three daily quizzes administered as confidence tests in 
his instructional block. Counts of the number of answer sheets retiirned were 
obtained for each daily quiz in each instructional block and each type of 
confidence testing. In AGE a sufficient number of subjects could not be found 



-21-31 



for any particular daily quiz since each shift used a different set of quizzes 
for the same subject matter. Therefore, analysis of personality variables was 
confined to the JEM course. Two daily quizzes were found in JEM for each type 
of testing with a sufficiently large san^jle size to merit further analysis. 

The respective numbers of students taking these daily quizzes were 106 and 105 
for the two quizzes chosen in the Pick-One case and 83 and 73 in the Distribute 
100 Points case. 

Each of the daily quizzes under consideration was scored in two ways: 

(1) a count of the total number of items answered correctly, termed the 
rights -only score, was made; (2) a confidence score based on the specific 
scoring function for confidence testing was obtained. Preliminary coirelations 
were calculated among the rights-only score, the confidence score, and the 
shift in which the instruction took place. Shift, was used as a variable in 
this analysis since it was suspected to be correlated with the test scores. 

Such correlation matrices were calculated for each of the four tests \mder 
consideration. The results of these calculations indicated that there was a 
significant association between the rights-only score and the shift, thus 
implying the need to remove or partial the shift variable from any further 
correlations . 

The correlations between the confidence test scores (CON)^ the rights- 
only score (rights), and the various personality variables are given in Tables 
9, 11, 13 , and 1^. The correlations between confidence test scores and the 
personality variables having partialled the rights-only score are given in 
Tables 10, 12, lij, and I 6 . These correlations were the proper within shift 
estimates of the population correlations. Only correlations significant at the 
.0^ level of significance are reported. 

From Tables ^ and 11, it should be noted that for daily quizzes admin- 
istered to those students using Pick-One confidence testing, the two ways 
of scoring the test correlated .92 and .89* These correlations are extremely 
high and indicate that it makes little difference how the tests are scared. 

Also, it may be seen by ccmparing Tables 9 and 11 with Tables 10 and 12, that 
the inter correlations among the personality variables tend to remain 
stable whether or not the rights-only score was partialled out. This occurred 
in part because the correlations between rights-only and the personality scores 
were small in absolute value. However, the correlation between rights-only 
and the confidence score was sizeable, and partialling out the rights-only 
score affected the correlations of confidence scores with the personality 
variables. The resulting partial cprrelations are quite different when the 
results of Quiz 1 are compared with those of Quiz 2. Hence, unless there is 
seme crucial difference bet>jeen the two quizzes that produces these differences 
one would tend to attribute them to randomness in the i^ystem. It should be 
noted in this connection that many coefficients were generated and conpared 
on these data, and some apparently significant results are likely to appear by 
chance. The significance tests are probably much less conservative than usual 
and it would not be uncommon to find "significant" relations that were only 
apparent in this situation. Note, for example, that authoritarianism correlated 
significantly with the confidence score for both quizzes but with opposite signs. 
The significance of this correlation helps substantiate the hypothesis that 



assigning high confidence is associated with some personality trait. However, 
the fact that the correlation coefficients were opposite in sign conflicted 
with this notion. 

Another approach to studying the rel.ation between personality and 
confidence score was to estimate the contribution that personality test 
scores make to the prediction of confidence test scores over and above that 
made by shift and by the number right score. Shift was included as a 
predictor because of the possibility of events taking place in the shift which 
have an effect on the confidence score. The number right was included because 
it was supposed that the value of the confidence score lies in the fact that 
its deviations from the number right are meaningful. Clearly, meaningful 
variation of the confidence scores that was shared with the number right did 
not help one decide between them. Hence, the variation of interest is that 
\diich was not shared with the number right and further should not occur merely 
because of temperamental or personality dispositions. Thus, analyses were 
performed wherein the residual variation after the confidence test scores were 
predicted, using the personality variables, shift, and number right, was 
con^jared with the residual variation after only shift and the number right was 
used on the predii;tor side. The residual variation was larger when the fewer 
number of predictor variables was used, but not large enough. The test of 
the size of the residual variation was made with an F-ratio which takes into 
account the furious accuracy achieved by fitting more predictors, and the 
test indicated that the additional accuracy added by the personality variables 
was well within that which might be expected by chance. Thus, it was concluded 
that personality variables were not related to the confidence score and that 
the scoring systems under study were so similar that it seems to make little 
difference which scoring system is used. 

The results for the quizzes in the Distribute 100 Points format was 
similar to that of the Pick-One format. The estimates of the population 
correlations were given in Tables 13 and 1^ and with the rights-only score 
partialled out in Tables ih and l6. Although the intercorrelations among 
the personality variables appear to be stable over the two quizzes, the 
int ere orrelat ions of the personality variables with the confidence scores 
were unstable when the rights-only score was partialled out. In order to 
test simultaneously the significance of the correlations between personality 
and confidence score for each daily quiz, multiple regression analyses were 
performed for each daily quiz, using the personality variables as predictors 
and the confidence test score as the dependent variable with shift and rights- 
only score serving as covariates. In each case, F-ratios were found to be not 
significant. Thus, it was concluded that the significant first order coire- 
lations obtained were a result of randomness. 

In summary, no personality variables were found to relate to confidence 
test score when the influence of rights-only score was removed. The rights- 
only and the confidence test scores in both the Pick-One and Distribute 100 
Points formats were found to be so highly related that there appeared to be no 
practical difference in the scoring systems other than changing the scale of 
measurement . 



H3 



- 23 - 




On 

rH 

V 



CO 

0 

-p 

s 

•H 

•H 

* 






»• 

:** 



I 



I 

j 

: 



Table 12 



o 

ERIC 



Q> 

O 

•H 



O 

O 

Q> 

0 

1 

o 

•H 

'O 

CTJ 

CO 
Q> 
i— I 

. CTJ 
•H 

u 

•H 

rH 

a 

o 

CO 

Q> 

o 

.a 

u 

4-3 

03 

g 

•H 

4-> 

0) 

o 

o 



CVJ 

♦H 

a 

p 

0 

0) 

n) 

♦H 

4^ 

u 

1 
S 

o 

o 

CO 

43 

(0 

0) 

H 



I 

(0 

43 

'S 

•H 

Q> 

U 

O 

O 

CO 

43 

CO 

0) 

H 



a 

a 

ct; 



CO 



CO 

a 



CO 






w 

a 

<*: 



o 

o 

Q 



o 

o 



H 



3#C H 



w VO 

H C\J 

o • 



CN 

-P 



3(c CO )4c 



3^ 9^ 9^ )}c 



OO 9(C 9(C 9^ 9^ 9(C 

I 



■LTV -:J 

H U\ sjc sjt :{e :{e sjt 



XTv vO ro ■LT\ 

<M H ITv H ♦ 

• • • • 

I I I 



CO oo vo 

vO CO H vO 9#C 9jc 



5jc H 9#C 



* * 



CO lA 

3#C3ftC3ftC3jC9#C9#cH9#CrH9#C 



CVJ lA C- IfV 0\ 

f^3#CrH3#C3#C 3(C -3#CC>JrH*H 



On ia c^ 

94c ♦ 9#c ♦ 94 c,9|c 



§ g ^ a 

37 

‘ ' -27- 



« 

M 

^ CO CO C!J 

g a s a 



Pi 



indicates r<C. 139 . 



Table Ih 



^ M T*.*" * 



:;: 



O 

ERIC 



1 



I 

a 



CO 



CO 









w 

< 



o 

o 



H . 
* 



CVJ vO ♦ 



■ ■ 



lA . . ^ ^ . 

rvj:jc 3jc :Jcc\j:JcrHCVJ:<c 



« 



a 



- 29 - 



39 



\f\ 



\f\ o 

CVJ fA 



CVJ 

9|c CVJ 9(c 



-CJ CO 

vO 3|c H + 



5^ ♦ fA 3jc 3jc 

I 



^ a|c 9{( 



-CJ CO 

:JC3jCfACJ5jClH3|C3jC 



o 

VOOOjjCVO* 5jc 3jC3jC*3jC 



:<C3(Ccd3|C3jC3|C3jC3|cH3jC 



OO rH iH 

• • • 



♦ ♦ 



^ssgaasSa§SS 



V 



(0 

0) 

-p 

2 

•H 

TJ 

5 



Correlation Matrix for Personality Variables, Rights-Only 






o 

ERIC 



C\J 

•H 

pi 

cy 

c 

o 

2 

o 

o 

CO 

-p 

CQ 

0) 

EH 

0) 

o 

Ti 

•H 



O 

O 

CO 

-p 



o 

o 

0 

iH 

0) 

•p 

1 

U 

-P 

CO 



•H 



04 



e 



CO 



CO 



13 



§ 



a 

«s! 



8 

p 



I 



rH 



I 



I 



vO CO 
CVJ \A 



0\ 



vO C^ 
\A 00 H vO 



NO 

♦ H ♦ 



♦ cn ♦ 



O vO 

♦ H + 



vO O vO 

05 ♦ C\J H 



9|c CVi 

I 



. a- 

* CM ♦ 



On OO 
CM H 



♦ ♦ 



♦ ♦♦♦♦♦ 






£ 

4 J 



to to 



0 



OO 

* iH 



9{C 9{C t >{C >{C 4c 



4c 4c 4( 4c 4c 4c 



4c 4c 4c 4c 4c 4c 



CO 

rH 



4 c 4c 4c 4c 



CO c'J 

4c4c 4c4c4cH4c 4 c4c4c4c4cOn 

I 



to 



I § I 



g ^ !3 

- 30 - 



indicates r < .155 



Correlation Matrix for Personality Variables and Distribute 100 Points Confidence 



O 





indicates r <.155. 



/ 

f 



I 



Quiz Administrative Time and Scoring 

Each instructor involved in the experiment was asked to keep a log 
indicating the length of time required to administer and score the daily- 
quizzes given during his instructional period. In general^ Instructors 
using conventional multiple-choice testing kept records for every quiz 
given, while instructors using one of the confidence procedures kept 
records only for -the quizzes they actually scored. using their particular 
confidence format. Distributions of reported testing and scoring times were 
obtained and appear in Table 17. In addition to the distributions, statistics 
for the mean scoring time (minutes) required to score an answer sheet were 
obtained for each type of testing under study in each course. Table 17 
indicates that although the mean test administration times were sign^icantly 
different s-batistically, -the difference was only one or two minutes in the 
Jet Mechanic course and "thus probably not significant in a practic'al sense . 

In AGE, on -the other hand, tests administered as Pick-One confidence testi^ 
were found to require less time for administration than either the conventional 
multiple -choice or the Distribute 100 Points confidence test procedures. This 
finding must have occurred as a result of the rough estimates of testxng 
administration time since the task required of an examinee using multiple - 
choice testing was oniy a part of what is required of an examinee using Pick-One 
confidence testing. 

For bo-bh courses the -table reflected extreme differences in -the time 
required to score the daily quizzes . Roughly speaking, the time required to 
score a Distribute 100 Points confidence test was about twice that required 
for scoring a multiple-choice test. The Pick-One confidence test scoring 
required even longer. 

The probable reason for the greater time required for the Pick-One 
confidence testing procedure than for the Dis-bribute 100 Points testing was 
■the use of a real number scale rather than the in-beger scale used for 
Distribute 100 Points. A modification of the scales used for hand scoring has 
been recommended elsewhere (Echtemacht, Boldt, & SellmEUi,- 1971) • These 
modifications reduced the Pick-One scale to an integer foimat and reduced 
■the number of possible scores in the Distribute 100 Points case. Had the 
scale for Pick-One confidence testing been in integer form, it was hypothesized 
that this method would have reduced the time for scoring Pick-One tests to a 
level less "than that for Distribute 100 Points. 



ERIC 



-32- 



42 



1 

i 



i 




! 




0 ) 

fH 




SECTION V 



Conclusions 



Conclusions from this study may be grouped into four categories: 

(1) those involving directly measurable criteria, i.e., end of block 
examination scores and the average number of student remediationsj 

(2) those involving student and instructor attitudes^ (3) those involving 
the relationship between personality and confidence marking^ and (U) those 
involving the cjuxz administration and scoring time* 

1. Conclusions resulting from the analysis of the directly 

measurable criteria. 

a. The effectiveness of the two types of confidence testing 
under study > Pick~One and Distribute 100 Points confidence 
testing, is dependent upon the type of remediation used 
and the shift in which the procedures were used. 

b*. When significant differences between types of testing occurred 
with respect to end of block examination scores, multiple choice 
testing was always low, and the difference between multiple 
choice testing and the confidence procedures was large. 

c. The effects of using either the Pick-One or the Distribute 100 
Points method were mixed. Neither method appeared to be 
superior to the other. 

d. Multiple-choice testing resulted in more remediations being 
required, on the average, than either confidence testing 
procedure. Neither confidence testing procedure was superior 
to the other with respect to average number of remediations 
required. 

2. Conclusions resulting from the analysis of student and instructor 

attitudes. 

a. Although items were found favoring each of the three types of 
testing under stu<^, no one method emerged as being highly 
preferred. It is concluded that students are indifferent to 
the type of testing method to be used for daily quizzes. 

b. Instructors tended not to use the confidence marks in planning 
remediation. 

c . Instructors e 3 q>ressed difficulty with the scoring of the 
confidence testing and objected to the length of time required 
to score them. 




3. Conclusions resulting from the analysis of the personality data. 

a. Tests scored with a rights-only scoring formula correlate 
so highly with confidence test scores, using either the 
Pick-One or Distribute 100 Points methods, that the use of a 
confidence score seems unnecessary from a pi^chometric 
point of view. 

b. Although some personality trait scores correlate with 
confidence test scores to a significant degree when 
rights-only score is partialled out, these correlations 
do not appear to be stable from one test to another. 

c. Various personality traits do not contribute significantly 

to the prediction of confidence test scores when the influence 
of rights-only score is eliminated. 

li. Conclusions resulting from the analysis of quiz administration and 

scoring time. 

a. Although the difference is statistically significant, there 
is no practical difference in the time required to administer 
daily quizzes in technical training as confidence tests. 

b. The time required to score the Distribute 100 Points method of 
confidence testing is about twice that required for multiple 
choice. In the case for the Pick-One method,, the time required 
is slightly more than that required for the Distribute 100 
Points method. 

c. Simplified scoring tables could be developed that should yield 
Pick-One scoring times that are closer to the multiple choice 
times. 



er|c 






- 35 - 



45 



SECTION VI 



I 

'r 



Re c ommenda t ions 



i 

t 

I 



1 . 



Since the 
advantage 
mentation 



results of. the study did not indicate an overwhelming 

for confidence testing, it is recommended that the iiiiple-_ 

of ar^ confidence testing program be undertaken with caution. 



2. Further work is required on sinqjlifying the scoring procedure used by 
the instructors. 



3. 



h. 



Confidence testing does merit consideration as a method for diagnostic 
testing in -cechnical training since students using the procedure perform 
as well or better on end of block examinations than students using 
conventional multiple -choice testing, and the number of r^ediatxons 
required seems to decrease when confidence testing is used. 

Further work is required on developing systems for using confidence 
responses in remediation • 



COMMENT: Since it appears that confidence testing affects s^sequent 

perfo i ma nce, it may be that it should be used to a greater extent in 
remediation procedures. It may also be that the results of confidence 
testing would be used in remediation more than they were here g more use 
of intennediate levels of certainty by the examinees could be brought ab^t 
(possibly through the use of the modified Pick-One procedure and scoring;. 
Such a project would require more adjustment of the teaching procedures, 
particularly remediation procedures, than was possible in the present study. 



ERIC 



- 36 - 






REFERENCES 



Air Training Connnand, Plan of instruction: Jet engine mechanic. POI 3ABRi:3230. 

331:5th Technical School, Chanute AFB, 111., 1970. (a) 

Air Training Command, Plan of instruction: Aerospace ground equipment repairman. 

POI 3ABRl|2133. 331:5th Technical School, Chanute AFB, 111., 1970. (b) 

Alpert, R. & Haber, R.N. Anxiety in academic achievement situations. Journal 
of Abnormal and Social Psychology , I960, 207-215. 

Barratt, E. S. Anxiety and iiT 5 )tilsiveness related to psychometric efficiency. 
Perceptual and Motor Skills , 1959 j 191-198. 

Boldt, R. F. A siii 5 )le confidence testing format. AFHRL-TR- 71-31 • Lowry AFB, 
Colo.l Technical Training Division, Air Force Human Resources 
Laboratory, July 1971. 

Clayton, M. B. & Jackson, D. N. Equivalence range, acquiescence, and over- 
generalization. Educational and Psychological Measurement , 1961, 21 , 

371-382. 

Coombs, C. H., Milholland, J.E., & Womer, F. B. The assessment of partial 
knowledge. Educational and Psychological Measurement , 1956, 16 , 

13-37. 

de Finetti, B. Methods for discriminating levels of partial knowledge 
concerning a test item. British Journal of Mathematical aiid 
Statistical Psychology , 1965, IJi b7-123. 

Ebel, R. L. Confidence weighting and test reliability. Journal of Educational 
Measurement , 1965 i 1:9-57. 

Echterriacht, G. J., Boldt, R. F., & Sellman, W. S. Users handbook for confidence 
testing as a diagnostic aid in technical training " AFHRL 71-31:. Lowry 
APB, Colo.: Technical Training Division, Air Force Human Resources 

Laboratory, July 1971. 

Echtemacht, 0. J. The use of confidence testing in ob.iectlve tests . AFHRL 71-32. 

Lowry AFB, Colo.: Technical Training Division, Air Force Human Resources 

Laboratory, July 1971. 

Fisher, R. A. Statistical aethod for fcesearch workers . (13th ed.) New York: 
Hafner, 1956. ” 

0oug.h, H. 0. Manual for the California Psychological Diventory . Palo Alto, 
California: Consulting Psychologists Press, 1957. 

Hevner, K. A. A methoa of correcting for guessing in true-false tests and 
enq}irical evidence in support of it. Journal of Social Psychology , 

1932, 359-362. 




-37- 



47 



Kogan, N., & Wallach, M. A. Risk-Taking; A. Study 3Ji Cognition an d Personality. 
New York; Holt, Rinehart and Winston, 1961;. 

Morrison, D. F. Multivariate Statistical Methods . New York; McGraw-Hill, 196?. 

Pruzek, R. M. Methods and problems in the analysis of multivariate data. Review 
of Educational Research , 1971 j ijlj 163-190. 

Rao. C. R. Advanced Statistical Methods in Biometric Research . New York; 

Wiley, 1952. 

Rokeach, M. Political and religious dogmatism: An alternative to the 

authoritarian personaliliy. Psychological Monographs ; General and 
Applied , 1956, X9, 18. 

Shuford, E. H., Albert, A., & Masse ng ill, H. _E. Ad^ssible probability 
measurement procedures. Psychometrika , 1966, 125~1 up* 

Shuford, E. H. & Massengill, H. E. Confidence testing at the officer trainii^ 
school. Lackland Air Force Base, September 1968 « Lexington, Mass.; 

Shuf ord-Massengill Corporation, 19t>9* 

Swineford, F. Measurement of a personality trait. Journal of Edu cational 
Psychology , 1938, 22 f 295-300. 



APPENDIX I 



Procedures and Scoring 

Page 

Procedures ^0 

Log Sheet for Scoring 

Scoring by the Logarithmic Function 

Scoring Table for Pick-One Confidence Testing U6 

Exan^ile 

Answer Sheet Exair^jles 50 



49 



- 39 - 



CGNFUMCE TESTINa AS A DIAGNOSTIC AID IN lECHNIGAL ISAININa 



1.0 Introduction : During the next, several weeks, classes in Aerospace 

Oroimd Equipment Repainaan and Jet Ehgine Mechanic courses will take 
part in a study of confidence testing as a diagnostic aid* Blocks 6, 

7, and 8 have been selected for study In the AOE course and blocks 2 
and 3 for the Jet Bnglne course. Ihis is the study which a briefing 
was given to the instructors on 13 August. 

2.0 Objectives : The objectives of this stu(^ are two in xusaber* First is 

to detemdLne the feasibility of using confidence testing as diagnostic 
evaluative aids to instructors in AF technical training courses* This 
Beans that the study is designed to deteznine how well confidence test- 
ing helps you find out exactly what your students know, and how well, 

and Just what they are not sure of or think they know when actually they 
don* t* The second puipose is to determine the cost effectiveness of 
confidence testing vs that of conventional multiple choice testing prac- 
tices currently in \ise in AF technical training courses. Ibasibillty 
will be evaluated in terms of student performance, attitudes toward the 
applicability and practicability of confidence testing, and a cost 
effectiveness analysis* 



3,0 Tasks to be performed ; The design is set up so that there will be 
six different kinds of classes for each shift of each course* Tou as 
instructors will be responsible for a number of tasks. 



3*1 Personalia tests ; There will be three i>ersonality tests that need 
to h** a fhn-tn-tH terflri as ntyyn as possiblj. Test 1 asks the student to ex- 
press agreement or disagreement with a number of opinions. Test 2 is a 
test of willingness to take risks. Test 3 consists of a number of state- 
ments about feelings, tendencies, and preferences of the student which 
either characterize him or are uncharacteristic of him. Every student 
should respond to every item on the tests; if a student cannot take the 
tests at the same time as the others, you should make arrangements for him 
to take these tests as soon as possible* The best possible time for these 
tests to be administered is the first day of class in block 6 of the AOS 



course and block 2 of the Jet Engine course* This will be the only time 
these tests will be administered. It is absolutely necessary jtor every 
student to answer an the items in the personality tests onoe* Make sure 
that each student places his name and social security number in the tpper 
right hand comer of each test* 



3,2 The tq^s of testing groyxs ! Tou will be asked to conduct all your 
fthoiefl qu3[z^ in one of thzue ways for a given group* 



3.2 •! COTiventlonal testing: Some groups will use the method that is new 

In use* In this method the student sinply marks the answer that he thinks 
is most likely to be correct* Directions for the students are included* 
There should be no writing on the directions as these are to be used again 
for each testing period. 




- 1 : 0 - 



50 




3* 2 *2 Kck-one con^dence testliig ; In this Mthod the student is asked to 
choose t!he answer tbai Ee thinlcs Ts most likely to be correct as in 3*2*1 
and then Indicates hoir sure he vas that the answer he Barked was in fact the 
correct one* This is done on a five point scale that appears on the rl^t 
of the answer sheet* Directions for the students are included* There should 
be no writing on the directions as these are to be used again for each testing 
period* 



3*2*3 Distribute 100 points confldmce teeing : In this Biethod the student 

first indicates the anmer he thinks is most likely to be correct and 
narks that one^ then he shows his feelings about the possible alternatives 
distributing 100 points over the alternatives placing the nost points on 
the answer that he has narked and a lesser zxnaber on any of the alternatives 
that he feels Bdght be correct* Directions for the students are Included* 
There should be no writing on the directions as these are to be used again 
for each testing period* 



3*3 ^e types of remediation gonps t In addition to using the above nan- 
tioned types or testing^ you ’ml craduct your renediatlon according to two 
different t^pes of renediaticn* 



3*3*1 Control renediation t Control renediatlon refers to that remediation 
that you are now using* You should assign people to renediatlon exactly as 
you do now and conduct remediation as you have in the past* An luportant 
point here is not to adjust your reaiediation in li^t of the fecial reB»dia- 
tion that is described next* The goal of the study is to coapare the present 
Biethod with the Biethod that follows which makes it necessary for you to con- 
duct your remediation exactly as you do now when using control remediation* 

3*3*2 Special reaedlation for conventional testing t 

1* The first st^ in this Biethod la to decide Just who is to attend 
remedial* This should be done as you have in the past* 

2* After you decide \Aio is to attend remedial^ Bake a list of all the 
itsBis that every student Bdssed in the rsBiedlBl grovp* These iteBis 
are the conmon grovp of items Bdssed by evexyone in remedial* Dur- 
ing the remediation^ you must explain why these answers they amuriced 
were wrong and why the correct answer was ri^t* 

3* lou should make a second list of questions that only some of the 
people in the remedial grotf> Bdssed* Since scan in this grovgo 
answered correctly^ have the etudenta who answered correctly 
explain vby the others were wrong and^ why the corract answer was 
ri^t during remediation* 

h* The basic principle involved in that evexy student in maedliation 
should know why his wrong answers were wrong and why the right 
answers were correct* 




-m- 



51 




3,3*3 Special renediatlon for conTldence teetlng t 

1, The first step here is once again to decide Tdio is going to have 
to attend the remedial sessions* This should be done as you have 
in the past. 

2» Once you have decided who is to attend remedial, make a list of 
all the questions that every student assigned to remediation 



3 Then, look at the confidence they placed In their wrong answers 
to these questions. If they placed a large amount of confidence 
in their answers, a great deal of time must be spent explaining 
why their answers were wrong and then why the correct answer is 
ri^t. If they placed a small amount of confidence in their ans- 
wer, less time may be spent on explaining idiy their answer were 
wrong. The question of how much is a large amotmt of confidenco 
and a ?»nnn amount of confidence should be decided b y you rsey . 
For geimral puiposes thou^ we can say that a large amount of 
confidence is 60 or more in the system where 100 points are 
distributed and when either of the top two confixience responses 
are checked in the system where you Indicate confidence on^ In 
your answer* A small amount of confidence is between 25 UO 
in the one system and when either of the lower two recponses are 
checked in the other system* 

U, A list of que-Hons should be made that oh3y some of the retMdlal 
grotp missed. In this case the people yibo answered the question 
correctly should eoqpl ain * 

general prinoijAe to be followed here is the same as pre^ous, 
that being, the more confidence an individual places in a 
answer, the more time is required to show that student why his 
answer is wrong. 



missed. 




eat three times. lou must encourage then to answer 
circumstance tell then that their confidence marks 



honestiy and under no ci 
do not count in grading. 




ERIC 



-U2- 




3.5*1 special sooiHng for the plck«»one confldance tegtdngt 

1. The first task that you must do Is to axanine the choioas to the 
left* Circle any of those ansirers that are Inoorroot. 

2. Next, look the score on the scoring table for the answers that 
have been narked c or rect and add these together for the individual. 

3. Next, look \xp the scores on the scoring table for the circled 
answers and add these together for that sane individual. To obtain 
the total score, take the scores from 2 and decrease that score by 
the anount datennined in 3. 

3.5.2 Scoring for the distribute 100 points confidence testing ; 

1. First, circle the nunber of points that the student has given the 
correct answer. Ton can ignore the answer on the left hand side 
of the paper. 

2. Next, look vp on the scoring table the score that corresponds to 
the nunher of points that is given to the correct answer. 

3. Md these nunbers together to obtain the total score. Vlhen a 
student places less tW 10 points on the correct answer, notice 
that his score is negative. 



3.6 Student questionnaires t At the end of block 8 in the ACS course and 
block 3 in the Jet Oogine TOurse on the last class day, you are to adninls- 
ter one of three questionnaires to the students. The form of the question- 



naire that you adninister to the particular class depends on the kind of 
testing t2iat group has been using over the past weeks* The first form is 
used for students that use the conventional testing, the second fora for | 

the pick-one confidence testing, and the third fora for the distribute 100 | 

points confidence testing* t 

? 

3*7 Instructor questionnaires » Qie questionnaire that you are asked to | 



fill out is of the open ended variety* That aeans that you are to respond ] 
freely and as corpletely as possible* The more infornation yon can give us I 
the better we will be able to nake recooinendatlons for future applications* j 
These questionnaires should be filled out iJBBediately after the last groip | 
that is involved in the eaperlaent finishes your block* j 

i 



I 




- 1 ^ 3 - 



53 



COURSE BLOCK BLOCK TITLE 




INSTRUCTOR SHIFT 


Connents 


























Tine for 

Test Adadjolstratlon 


























Tine for 
Scoring (Mtns.) 


























Type of Scoring 
Regular Special 


















































Nunber Answer 
Sheets Scored 


























a 
















L 











- 1 * 4 - 



SCORING TABLE TOR DISl’HIBUTE iOO POjQJTS CONFIDENCE TEST 



I 



\ 

i 

) 

I 

1 



o 




Points on tho 




Points on tbs 




Corroot Anavor 


Scoro 


Corrsot Anamr 


Seort 


0 


- 100 






1 


- loo 


51 


71 


2 


- 70 


92 


72 


3 


- 52 


53 


72 


h 


- IiO 


5b 


73 


$ 


- 30 


55 


7b 


6 


- 22 


56 


75 


7 


- 15 


57 


76 


8 


10 


58 


76 


9 


- 0$ 


59 


77 


10 


00 


60 


78 


11 


Ob 


61 


79 


12 


08 


62 


79 


13 


11 


63 


80 


111 


15 


6b 


81 


15 


18 


65 


81 


16 


20 


66 


82 


17 


23 


67 


83 


18 


26 


66 


83 


19 


28 


69 


8b 


20 


30 


70 


85 


21 


32 


71 


85 


22 


3b 


72 


86 


23 


36 


73 


86 


2U 


38 


7b 


87 


25 


bO 


75 


88 


26 


bl 


76 


88 


27 


b3 


77 


89 


28 


b5 


78 


89 


29 


b6 


79 


90 


30 


b7 


80 


90 


31 


b9 


81 


91 


32 


51 


82 


91 


33 


52 


83 


92 


3U 


53 


8b 


92 


35 


5b 


85 


93 


36 


56 


86 


93 


37 


57 


87 


9b 


38 


.58 


88 


9b 


39 


59 


89 


95 


ho 


60 


90 


95 


la 


61 


91 


96 


U2 


62 


92 


96 


U3 


63 


93 


97 


liU 


6b 


9b 


97 


U5 


65 


95 


98 


U6 


66 


96 


98 


U 7 


67 


97 


99 


I18 


68 


98 


.99 


U9 


69 


99 


100 


50 


70 


100 


100 



.2^5. O o 



SOORlMa TIELE FOR PICK-ONE OONFUXSfCE TESTlNa 



Score 

If the answer is correct and the confidence Is 

Very Sure l.CX) 

Sure *89 

Fairly Sure .56 

Not Very Sore *36 

Not Sore 0 

If the answer is wrong and the confidence is 

Very Sure -1.67 

Sure - *89 

Fairly Sure “ #33 

Not Very Sore ~ .17 

Not Sure “ 0 



The following is an example of a 1 $ item multiple-choice test 
administered as a confidence test using both Pick-One and Distribute 100 
Points foiTiats. This test served as a pretest of the confidence systems 
as it was given to approximately 20 ETS employees not associated with the 
project. The questions involve mostly information known td many people 
in the Princeton, New Jersey geographic area. An attempt was made to 
include questions of varying difficulty. 

Two answer sheets, one for each type of testing, follow. The answer 
sheets show how an examinee might respond to the- given quiz. 



EX/IMPLE 



O 

ERIC 



1, The colors on a New Jersey license plate are 

(a) black and white 

(B) blue and yellow 

(C) black and creai# 

(D) brown and cream 

2, The 1969 population of Princeton, New Jersey was estimated to be 

(A) 11,8903(- 

(B) 981 

(c) 13,060 

(D) ^6-,^^0 

3, The capitol of the state of Washington is 

(A) Seattle 

(B) Tacoma 

(C) Sipokane 

(D) Olympia* 

U. The largest city in New Jersey is 

(a) Jersey City 

(B) Newark* 

(C) Camden 

(D) Paterson 

The largest city in South Dakota is 

(a) Watertown 

(B) Souix Falls* 

(C) Aberdeen 

(D) Rapid City 

6. The 12 th President of the Ifriited States was 



(A) 

(B) 

(C) 

(D) 



Truman 

Taylor* 

Madison 

Pierce 



7. Hie 1969 population of Trenton was estimated to be 

(A) 101,000 

(B) 10U,000 

(c) 2,189 

(D) 102,000* 

8« Which automobile model is usually considered to be most expensive? 

(A) 

(B) 

(C) 

(D) 



Cadillac# 

Ford 

Rambler 

Plymouth 



-b8- 



58 



9. The colors on a 1970 New Mexico license plate are 

(A) yellow, and black 

(B) black and white 

(C) yellow and red* 

(D) red and white 

10, Which state does not border Tennessee? 

(A) Missouri 

(B) South Carolina* 

( C ) Virginia 

(D) Georgia 

11, Which city is closest to the Pocono Mountains? 

(a) Hiiladelphia 
( B) Bayonne 

( C ) Scranton * 

(D) Trenton 

12, Educational Testing Service is located in which township? 

(a) Lawrence * 

(B) Princeton 

(C) Ewing 

(D) Hopewell 

13* Which town is not located in Mercer County? 

(a) Princeton 

(B) Ringoes * 

(C) Harbour ton 

(D) High ts town 

lU, On the New Jersey Turnpike, what is the number of the New Bnanswick Exit? 

(A) 8 

(B) 8A 

(C) 9* 

(D) 10 

1 $, Which day is most likely to be pay-day at ETS? 

(a) Wednesday * 

(B) Friday 

(C) Monday 

(D) Thursday 



”^ 9 “ 



59 



a/s Exainple 



1 . 



2 . 



3. 



A 



u. 



B 






B 



A. 0_ 

B. 0_ 

C. 7 $ 

D. 2^ 

A. ?0_ 

B. 0_ 

C. ?o_ 

D. 0_ 

A. 70_ 

B. 0_ 

C. 0_ 

D. ^ 

A. 100 

B. 0_ 

C. 0_ 

D. 0, 

20 



A. 

B. 

C. 20 

D. 



1*0 



20 



O 

ERIC 



6, 



D 



-50- 



60 



A. 

B. 

C. ^ 

D. 



30 



1*0 



D 



30 



o 

ERIC 






61 



A. 

B. 



C. 



D. 



3 $ 



3 $ 



8 . 



9 . 



10 . 



A 



A. 100 

B. 0_ 

C. 0_ 

D. 0_ 

A. ^ 

B. 2£ 

C. 2|_ 

D. ^ 

UO 



30 



A. 

B. 

C. ^ 

D. 0 



11 . 



12 . 



C 



B 



A. 



30 



B. 0_ 

C. 70_ 

D. 0_ 

A. 0_ 

B. 100 

C. 0 

D. 0 



13 . 



50 



A. 

B. 

c. 50 

D. 0 



lU. 



25 



A. 

B. 

C. _ 21 

D. 



25 



25 



15 . 



A. 100 

B. 0 

C. 0 

D. 0 



o 

ERIC 



G2 



- 12 - 



c 



1 . 



2 . 



3. A 




B 



O 

ERIC 



very sure 

_X sure 

fairly sure 

not very sure 

_____ not sure 

____ very sure 

sure 

X fairly sure 

____ not very sure 
not sure 

very sure 

X sure 

fairly sure 

_____ not very sure 
not sure 

X very sure 

sure 

fairly sure 

not very sure 

not sure 

very swpe 
____ sure 
___ fairly sure 
I not very sure 

not sure 



- 53 - 



63 



6 



D 



very sure 
sure 

I fairly sure 

not very sure 

not sure 

very sure 

sure 

fairly sure 

X not very sure 

not sure 

3 ^ A X very sure 

sure 

fairly sure 
___ not very sure 
not sure 

____ very sure 

sure 

_____ fairly sure 

not very sure 

X not sure 



I 

i 

f 

} 

I 10. A 

f ' 

I 



\ 

I 



r. 

^ !; 

a I. 




_____ very sure 

sure 

^___ fairly sure 
X not very sure 
not sure 





64 



11, c very sure 

X sure 

______ fairly sure 

not Tery sure 

not sure 

12, B X '^^ry sura 

sure 

fairly sure 
not "very sure 
not sure 

T»ry sure 
sure 

fairly sure 
not "very sure 
not sure 

llj, c ■TOiy sure 

sure 

_____ fairly sure 
not very sure 
X not sure 

X very sure 
___ sure 

fairly sure 

not very sure 

nut sure 



15. 



i 



65 



- 55 “ 



13. C 



X 



APPENDIX II 



Personality Tests 



Personality Test 1 
Personality Test 2 
Personality Test 3 



TEST 1 



OPINION INVENTORY FORM PQEP 
Dirsctlone 

In this inventozy, you will find a number of statements eztpressing 
opinions with which you nay or nay not agree. Following each statement 



are six boxes 


labeled as 


follows! 








Strongly 


Disagree 


Slightly 


subtly 


Agree 


Strongly 


Disagree 


Disagree 


Agne 


Agree 


□ 


□ 


□ 


□ 


□ 


□ 



You are to indicate the degree to which you agree or disagree with 
each statement by checking the apprc^rlate box. 

You may notice an occasional statement with which you neither 
particularly agree nor particularly disagree. If so^ do the bast you 
can by checking the box that seen most appzoprlate. 

Please consider each statement carefully^ but do not ^end too much 
time on any one statement. Do not skip any items . 

There are no "right" or "wrong" answers— 'the only correct responses 
are those that are true ^r you . This Inventory is being used for research 
purposes only and your responses will be kept strictly confidential. 



67 

- 57 - 

TUI9I THE PAGE AND BEGIN. 



I 



Sex crimes, such as rape and attacks 
on children always deserve more than 
mere in^jrisonmentj such criminals must 
be publicly whipped or worse. 



The true American way of life is 
disappearing so fast that force is 
absolutely necessary to preserve it. 



Ihe businessman and the manufacturer 
are undoubtedly more important to 
society than the artist and the 
professor. 



Nowadays everyone is prying into 
matters that must remain personal 
and private. 



Someday it will certainly be shown 
that astrology can explain a lot of 
things. 



A’JJL of our social problems would be 
solved if we got rid of the immoral, 
crooked, and feebleminded people. 



No sane, normal, decent person could 
ever think of hurting a close friend 
or relative. 



Every person should have complete 
faith in his own independent judg- 
ment, not in some supernatural power 
whose decisions he obeys without 
question. 



\ 








□ □ □ 


□ 


□ □ □ 


□ 


□ □ □ 


□ 


□ □ D 


□ 


□ □ □ 


□ 


□ □ □ 


□ 


□ □ □ 


□ 


□ □ □ 


□ 



□ □ 

□ □ 

□ □ 

□ □ 
□ □ 
□ □ 

□ □ 

□ □ 



i 

t 




9* Nowadays, since democracy demands 
that people of widely different 
background and station nix together 
a person should never be finicky about 
catching a disease from any of then. 









10. When a person has a problem or worry 
he should always drop everything and 
concentrate iqjon it until the solution 
appears. 




□ □ □ 



11. We are certainly bound to admire and 
respect a person if we get to know 
him well. 



□ □ □ 



□ □ □ 



12 . 



An insult to our honor should always 
be overlooked, for "vdiatsoever shall 
smite thee on thy ri^t cheek, turn to 
him the other also." 




□ □ □ 



13. Every truly mature person outgrows 

childidi feelings of submissive respect 
and of excessive love and gratitude for 
his parents. 



□ □ □ 



lU. 



All attempts to divide people into two 
distinct classes of the weak and the 
strong are doomed to failure. 






□ □ □ 



15* Science has its place, but there are 
probably things that mig^ht not be 
understood by the human mind. 



□ □ □ 



□ □ □ 




16 . 



A person who had bad manners, habits, 
and breeding would probably Hnd it 
hard to get along with decent people. 






□ 

69 



□ □ 



□ □ □ 



- 59 - 





<9 


& \t 
% 


V* 

o Lx 


Seldom do weaknesses or difficulties 
hold us back if we have enough will 


□ 


□ 

D 


□ 


power. 












l8. Wars and social troubles may someday 
be ended by an earthqxiake or flood 
that could destroy the vdiole world. 



□ □ □ 



19. The wild sex life of the old Greeks 
and Romans was probably tamo conpared 
to some of the goings-on in this 
country, even in places where people 
might least expect it. 



20, What this country probably needs is a 
few courageous, tireless, devoted 
leaders in whom the people can put 
their faith. 



21, Some youth probably need the qualities 
of strict discipline, rugged determina 
tion, and the will to work and fight 
for family and country. 



22. The urge to Jump from high places is 
probably learned, not inborn. 



□ □ □ 



□ □ □ 



- □ □ □ 



□ □ □ 



23 . Ihe rebellious ideas that young people 
sometimes get should probably be 
encouraged and developed to guarantee 
mature citizenship in adulthood. 



2li, Probably few people have learned 
important things through suffering. 



□ □ □ 



□ □ □ 

70 



^ \\ 



a Ls, 



□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 



- 60 - 



o 

ERLC 



2$, A love of freedom and complete 
Independence may be Inportant 
virtues for children to loam. 



□ □ □ 



26. Because human nature is improving, war i r— *, |— ^ 

and conflict may someday be eliminated, j j j | | | 



27. Homosexuals may not be criminals and 
probably should not be punished as 
such. 



28. If people occasionally talked things 
over and didn‘ t work so hard, some 
others would probably be better off. 



29. In times like these it is often 

necessaz*y to be more on guard against 
ideas put out by people or groups in 
one's own canp than by those in the 
opposing camp. 



30. Man on his own is a helpless and 
miserable creature. 



31. A group which tolerates too much 

difference of opinion among its own 
members cannot exist for long. 



•-6I" 



□ □ □ 



□ □ D 



□ □ □ 



□ □ □ 



□ □ □ 



71 



^ \ 
□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 






-s'? 



A U 



\ 






O 






32. Unfortunately, a good many people 

with whom I have discussed important 
social and moral problems don't really 
understand what's going on. 



□ □ □ 



'§• 






t 

Q 



O ^ 



□ □ □ 



I 33 . Most people just don't know what's 
good for them. 



1 3I4. Of all the different philosophies 
I which exist in this world there is 

I probably only one which is correct. 



35. Most people just don't give a "damn" 
for others. 



36. It is only natural for a person to be 
rather fearful of the future. 



37 . In the history of mankind there have 
probably been just a handful of really 
great ttiinkers. 



38. The worst crime a person could commit 
is to attack publicly the people \dio 
believe in the same thing he does. 



39. yiy blood boils whenever a person 
stubbornly refuses to admit he's 
wrong. 

o t 

ERIC 



□ □ □ 


□ □ □ 1 

I 

t 

« 

1 


□ □ □ 


I 

i 

i 

□ □ □ 


□ □ □ 


□ □ □ 

i 

j 


□ □ □ 


□ □ □ 


□ □ □ 


□ □ □ 


□ □ □ 


□ □ □ 


□ □ □ 
. 72 


□ □ □ 



I 



ItO. The loain thing In life Is for a 
person to want to do something 
important. 



\ ■ 






□ □ □ 









A 

o 0 l4 
o 



□ □ □ 



Ul, A person who gets enthusiastic about 
too many causes is likely to be a 
pretty ’’wishy-washy" sort of person. 



□ □ □ 



□ □ □ 



U2. If given the chance I would do 

something of great benefit to the 
world. 



□ □ □ 



□ □ □ 



U3. 



A man who 
cause has 



does not believe in some great 
not really lived. 




□ □ 



□ □ □ 



Uh» It is only when a person devotes himself i 1 

to an ideal or cause that life becomes | | 

meaningful. 



□ □ 



□ □ □ 






k$<, In the long run the best way to live 
is to pick friends and associates 
whose tastes and beliefs are the same 
as one’s own. 



□ □ □ 



I 




U6. Most of the ideas which get printed 
nowadays aren' t worth the paper they 
are printed on. 



U7. 



Ihere are two kinds of people in this 
world: those \dio ara for the truth and 

those idio are against the truth. 




□ □ 




□ □ □ 
□ □ □ 
□ □ □ 






INSTRUCTIONS* In this task, you will be shown pairs of dice bets that vary 
in terns of the chances of winning and losing, and the amounts of money that 
can bo won or lost. I would like you to ohoose, in each pair, the bet that 
you would prefer to play. Indicate your decision by making a check in either 
box A or B in the ri^t hand ooluim below. Consider each pair separately — 
do not lot your decision in one oase influence your decision in another. Later 
you will have the opportunity to £\ctually play the bets that you now choose. 

You will play them in a dice game for the amounts of money described in the bets. 
So be sure that you ohoose now the bets that you actually will want to play, be- 
cause you will be held to them. 

The chances of winning and losing are written as fractions* Thus, 1/3 means 
1 chance in 3, 1/2 moans 1 chance in 2, 2/3 moaxiF 2 chances in 3. 



Check the 


box on the rii^t to indicate which bet you 


ohoose to make. The box 


narked A refers to the left side bet. 


the box marked 


B refers to the rig^t side 


bet. 














A B 


1. 


1/3 to win 


$1.20 


vs. 


1/2 to win 


$ 


.•30 1 


□ □ 




2/3 to lose 


$ .60 




1/2 to lose 


$ 


.30 1 


2. 


2/3 to win 


$ .1$ 


vs. 


1/2 to win 


$ 


.90 1 


□ □ 




1/3 to lose 


$ .30 




1/2 to lose 


$ 


.90 1 


3. 


1/2 to win 


$ .30 


vs. 


2/3 to win 


$ 


.30 1 


□ □ 




1/2 to lose 


$ .30 




1/2 to lose 


$ 


.60 1 


k. 


1/2 to win 


$ .90 


vs. 


1/3 to win 


$1.20 1 


□ □ 




1/2 to lose 


$ .90 




2/3 to lose 


$ 


.60 I 




1/3 to win 


$1.60 


vs. 


2/3 to win 


$ 


.U5 1 


□ □ 




2/3 to lose 


$ .90 




1/3 to lose 


$ 


.90 1 


6. 


2/3 to win 


$ .1$ 


vs. 


1/2 to win 


$ 


.30 1 


□ □ 




1/3 to lose 


$ .30 




1/2 to lose 


$ 


.30 ' 


7. 


1/3 to win 


$1.20 


vs. 


1/2 to win 


$ 


.60 I 


□ □ 




2/3 to lose 


$ .60 




1/2 to lose 


$ 


.60 1 


8. 


2/3 to win 


$ .15 


vs. 


1/3 to win 


$1.20 j 


□ □ 




1/3 to lose 


$ .30 




2/3 to lose 


$ 


.60 ■ 


9. 


2/3 to win 


$ .15 


vs. 


1/3 to win 


$ 


.60 1 


□ □ 




1/3 to lose 


$ .30 




2/3 to lose 


$ 


.30 


10. 


1/3 to win 


$1.80 


vs. 


1/2 to win 


$ 


.30 


□ □ 




2/3 to lose 


$ .90 




1/2 to lose 


$ 


.30 










'65- 75 









A B 



11. 2/3 to win $ .U5 

1/3 to loso $ .90 


vs. 


2/3 to win $ ,l5 
1/3 to lose $ ,30 


□ □ 


12. 1/3 to win $1.20 
2/3 to lose $ .60 


vs. 


2/3 to win $ .30 
1/3 to loso $ ,60 


□ □ 


13. 1/2 to win $ .90 
1/2 to lose $ .90 


vs. 


2/3 to win $ .1*5 
1/3 to loso $ ,90 


□ □ 


2ii. 1/2 to win $ .90 
1/2 to lose $ .90 


vs. 


1/2 to win $ ,60 
1/2 to lose $ ,60 


□ □ 


l5. 1/3 to win $ .60 

2/3 to lose $ .30 


vs. 


1/2 to win $ ,30 
1/2 to loso $ .30 


□ □ 


16. 2/3 to win $ 

1/3 to lose $ .90 


vs. 


1/2 to win $ ,60 
1/2 to loso $ ,60 


□ □ 


17. 2/3 to win $ 

1/3 to lose $ .90 


vs. 


1/2 to win $ ,30 
1/2 to loso $ ,30 


□ □ 


l8. 1/2 to win $ .30 
1/2 to lose $ .30 


vs. 


1/2 t6 win $ .90 
1/2 to lose $ .90 


□ □ 


19. 2/3 to win $ .U5 
1/3 to lose $ .90 


vs. 


1/3 to win $1,20 
2/3 to loso $ ,60 


□ □ 


20. 1/2 to win $ .90 
1/2 to lose $ .90 


vs. 


2/3 to win $ ,30 
1/3 to loso $ ,60 


□ □ 


21, 1/3 to win $1.20 
2/3 to lose $ .60 


vs. 


1/3 to win $ ,60 
2/3 to loso $ ,30 


□ □ 


22. 2/3 to win $ .15 
1/3 to lose $ . 30 


vs. 


1/3 to win $1,80 
2/3 to loso $ ,90 


□ □ 


23. 1/2 to win $ .90 
1/2 to lose $ .90 


vs. 


1/3 to win $1,80 
2/3 to lose $ .90 


□ □ 


2U. 2/3 to win $ .30 

1/3 to lose $ .60 


vs. 


1/2 to win $ .60 
1/2 to loso $ ,60 


□ □ 


25, 1/3 to win $1.20 
2/3 to lose $ .60 


vs. 


1/3 to win $1,80 
2/3 to loso $ .90 


□ □ 


26. 2/3 to win $ *l5 
1/3 to loso $ .30 


vs. 


1/2 to win $ .60 
1/2 to loso $ ,60 


□ □ 




- 66 - 



76 



TilitiT 3 

PERSONAL INVENTORT— FORM PQEP 
Dlrectlona 

This inventory consists of a number of statements about feelings, 
tendencies, and preferences that may or may not be characteristic of 
you* Following each statement are six boxes labeled as foUowst 




UNCHARACTERISTIC CHARACTERISTIC 

□ □ □ □ □ □ 



Notice that there are three boxes on the left labeled Uncharacteristic 
with three gradations of difference ranging from Somewhat throu^ Hoderatoly 
to Definitely Uncharacteristic* Likewise there are three boxes on the right 
labeled Characteristic with three gradations of difference ranging £rom 

Somewhat throue^ Moderately to Definitely Characteristic* You are to 
Indicate the degree to which each statement is characteristic of you by 
checking the appropriate box* 

You may notice an occasional statement that is neither particularly 
characteristic nor particularly uncharacteristic of you* If so, do the best 
you can be checking the box that seems most appropriate* 

Please consider each statement carefully, but do not speed too much 
time on any one item* Do not skip any items . 

There are no "right" or "wrong" answers — the only dorrect req)onsos 
are those that are true for you * nils Inventory is being used for research 
purposes only and your responses will be kept strictly confidential* 



TURN THE PAQE AND BEGIN 



o 

ERIC 



% 



\ \ \ 
\ 



>6 



\ 



t 






1, Durlxig exams or tests^ I block on 
questions to which I know the 
answers^ even though I night 
renenber then as soon as the exan 
Is over. 



UNGHARAOTERISTIC 

□ □ □ 



2. Ihe more Important the e xam i n ation, _ __ 

the less well I seen to do. | | | [ | | 



Time pressure on an exam causes ne to 
do worse than the rest of the group 
under similar conditions. 



I find that my mind goes blank at the 
beginning of an exam, and it takes ne 
a few minutes before I can function. 



5. Nervousness while taking an exam or 
test hinders me from doing well. 



6. I find ffliyself reading exam questions 
without vmdorstanding them, and I must 
go back over then, so that they will 
make sense. 



7. When I don» t do well on a difficult 
item at the beginning of an exam, it 
tends to ipset me so that I blook on 
even easy questions later, on. 



8. In a course whore I have been doing 
poorly, my fear of a bad grade cuts 
down my efficiency. 



- 69 - 



□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 

79 



% \ 

GHARAGIERISTIC 

□ D □ 



□ □ □ 



□ □ □ 



□ □ □ 



□ no 



□ □ □ 



D □ □ 



□ □ □ 



L 



9. When I a« poorly prepared for an 
exam or teat, I get upset, and do 
less well than even ny restricted 
knowledge should allow* 



10. I am so tired from worrying about 
an exam, that I find I almost don' t 
care how well I do by the time I 
start the test* 



11* While I may (or nay not) bo nervous 
before taking an exan, once I start, 
I seen to forget to bo nervous. 



12* I look forward to exans* 



13* The nore important the exam or tost, 
the better 1 seen to do. 



m* When I start a test, nothing is 
able to distract ne. 



UNCHARACTHIISTIC 

□ □ □ 

□ □ □ 

□ □ □ 
□ □ □ 
□ □ □ 
□ □ □ 



l5* Although "cramning” under pre- 

exanination tension is not effective 
for most people, I find that if the 
need arises , I can loam material 
innediately before an exam, even under 
considerable pressure, and successfully 
retain it to use on the exam. * 



□ □ 





CHARACTERISTIC 

□ □ □ 

□ n □ 

□ □ □ 
□ □ □ 
□ □ □ 
□ □ □ 
□ □ □ 






1 

5 



t 

I 

fi- 

's'- 

\r:. 

I.*;- 

fe 

gi:: 






er|c] 



\ ^ 

\ \ V 



\ % 



l6a In conrses in which the total grade 

is based mainly on one exam^ I aeem to 
do bettor than other peq>le« 



UNCHARACTERISTIC 

□ □ □ 



17* I enjoy taking a difficult exam more 

than an easy one. \ I J | | 



16 . I work most effectively under pressure « 

as tdien the task is very inportant. [ [ | [ [ [ 



19, Nervousness while taking a test helps _ 

me do better, [ [ [ [ [ [ 



20 . 



I am often one of the first to give 
up trying to do a thing. 



□ □ □ 



21 . 



I dislike work that requires a great 
deal of attention to detail. 



□ □ □ 



22. I am seldom methodical in the things ^ ^ 

ttotido. □ □ □ 



5 4 



-71- 81 



« t. 

CHARACTERISTIC 

□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 






23* I often don' t finish tasks 1 start, 
flonetiiDes even if they are very 
iB|>or tant • 



2U. It doesn't bother me to change my 

plans in the midst of an undertaking. 



25. I do not work and study following 
a strict schedule. 



26. Occasionally, I have done something 
dangerous just for the thrill of it. 



27. I do not believe that pronptness is 
a very important personality charac- 
teristic. 



28. I enjoy having to adapt myself to 
new and unusual situations. 



29. I am not always careful about my 
manner of dress. 



30. I seldom become so wrapped up in 
something I am doing that I find it 
difficult to turn my attention to 
other matters. 



UMCHARACTEJIISTIC CHARACTERISTIC 

□ □□□□□ 

□ □□□□□ 

□ □□□□□ 

□ □□□□□ 

□ □□□□□ 

□ □□□□□ 

□ □ □ □ □ □ 

□ □ □ □ □ □ 






»• 



ERIC 









\ 4 • 



\ ' 



31* Ify* interests tend to change quickl 7 . 



UNGHARACIERISTIC 

□ □ □ 



32. 



I an Inclined to go fron one activity 
to another without continuing with any 
one for too long a tins. 



□ □ □ 



33. 



I think it is usually wise to do things 
in a conventional way. 



□ □ □ 



3U. 



I find it easy to stick to a certain 

schedule, once I have started it, j | 



35. 



I often find niyself thinking of the 
sane tunes or phrases for days at 
a tine. 



□ □ □ 



36. 



I always put on and take off ny 
clothes in the same order. 



□ D □ 



37. I usually check more than once to be 

sure that I have locked a door, put out | I I j [ j 

the light, or something of the sort. 1— — > I—* ^ 



38* I usually find that iqy own way of 
attacking a problem is best, even 
thouc^ it doesn't always seem to work 
in the beginning. 



□ □ □ 



-73 



. 83 




CHARiCTERISTIC 

□ □ □ 

□ □ □ 

□ □ □ 

□ □ □ 

□ □ □ 

□ □ □ 

□ □ □ 

□ □ □ 




39, I try to follow a progran of life based 
on duly. 



UNGHARACTERISTIC 

□ □ □ 



GH&RiCTBBlSTIC 

□ □ □ 



UO. I prefer to stop and think before I 
act even on trifling netters. 



Ul, I usually maintain my own opinions 

even thou|^ many other people amy have 
a different point of view. 



1|2. I never miss going to church. 



U3. Ihero ia usually only one best way 
to solve most problems. 



UU, I usually dislike to sot aside a task 
that I have undertaken until it is 
finished. 



I am usually able to keep at a Job 
longer than most people. 



□ □ □ 


□ □ □ 


□ □ □ 


□ □ □ 


□ □ □ 


□ □ □ 


□ □ □ 


□ □ □ 


□ □ □ 


□ □ □ 


□ □ □ 


□ □ □ 



t 



t 

I 

[ 

i 




U6. I scan newspapers rather than read 
then carefully. 



UNCHARACIGSISTIC 

□ □ □ 



GHARACTBRISTTC 

□ □ □ 



U7* I lot qyself "go" at a party. 



□ □ □ 



□ □ □ 



U8. As a youngster I enjoyed taking part 
In reckless stunts. 



□ □ □ 



□ □ □ 



li9. I like a great deal of warlety in 
ny work. 



□ □ □ 



$0, Ify friends consider ne to be 
^PP7-S0~lucky. 



□ □ □ 



□ □ □ 



□ □ □ 



$1. I like being idiere there Is something 
going on all the tizae. 



$2. I like work that has lots of 
excitement. 



□ □ □ 



□ □ □ 



□ □ □ 



□ □ □ 



53. I change my plans often. 



O 

ERIC 



( ,• 



□ □ □ 



□ □ □ 



-75- 



85 



514 . I lik« to take a ohanoe Juat Tor 
the excitement. 







\ 



□ □ □ 



OHARACTBRISnC 

D □ □ 



$$, I often make people laugh. 



□ □ □ 



□ □ □ 



$6, I like to do things on the spur of 
the moment. 



□ □ □ 



□ □ □ 



$1, I like to Moxfi croasHord puzzles, 



□ □ □ 



□ □ □ 



58. I like detailed voric. 



□ □ □ 



□ □ □ 



59. I like to play chess. 



□ □ □ 



□ □ D 



60. I usually notice the furniture 
arrangements in a strange house, 



□ □ □ 



□ □ □ 



6l. I don't like having bqt plans changed. 



□ □ □ 



□ □ □ 




- 76 - 86 



62. I like mathematics.' 



% \ 

ti Xa V 

UNCHARACTERISTIC 

□ □ □ 



63* 1 usually think before X leap. 



□ □ □ 



6U. I Uke to solve complex problems . 



□ □ □ 



6 $, I like work requiring patience and 
carefulness. 



□ □ □ 



66. I don't like changes. 



□ □ □ 



67 . I consider myself always careful. 



□ □ □ 



68. VIhen I have to carry throu^^x some 
project^ I prefer working on it with 
Interested colleagues, rather than on 
ny own. 

,]y - 77 - 87 



\ ^ 

CHARACIERISTIG 

□ □ □ 
□ □ □ 
□ □ □ 
□ □ □ 
□ □ □ 
□ □ □ 

□ □ □ 




6^« I often feel that I must discuss 

something I’ve read before I'll really 
understand or reneiaber It. 



UNGHARACTBRISnC 

□ □ □ 



CH&BAGTERISTIC 

□ □ □ 



70. I usxially solve a problem better 
by discussing it with others than 
by studying it alone* 



□ □ □ 



□ □ □ 



71. I find it helpful to discuss a 
problem with others before coining 
to a decision. 



□ □ □ 



□ □ □ 



72. I would prefer to learn about some- 
thing hy class discussion, rather 
than by reading a book on the subject 
in ny own time. 



73 . I would prefer a teacher who neglects 
me and leaves ne to my own devices 
over one who continually watches me 
and makes suggestions* 



7 U* VIhen required to make a number of 
decisions in a conparatlvely short 
time, I prefer to make them alone 
rather thiui with the help of others* 



7 $. I am bothered >dien someone offers ne 
advice I dldn* t ask for* 



□ □ □ 


□ □ □ 


□ □ 0 


□ □ □ 


□ D □ 


□ □ □ 


□ □ □ 


□ □ □ 



76* I like to do ny planning alone, without 
suggestions from or discussions with 
other people* 



□ □ 




□ □ □ 




-TA- 



SS 



77* I like working alone. 



UNCHARACIERISTIG 



□ □ □ 



CHARilClERISTIC 

□ □ □ 



APPENDIX III 



Types of Testing Formats 


Page 

81 


Directions for Multipj.e uioxce 


82 


Answer oneet lor wu-Loxpj.e uioxoe 


.... 83 


Directions for Multiple unoice. ricic une 


86 


Answer blieet lor 


. 89 


Directions for Multiple Choice. DistfriDaue luu 

Answer Sheet for Distribute 100 Points 


93 



- 80 - 



90 



DIRECTEONS FOR MULTIPLE CHOICE 



You are advised to use your tine effectively and to work as rapidly as you can 
without losing accuracy. Do not waste your tine on questions that are too 
difficult for you. Qo on to the other questions and cone back to the difficult 
ones later if you can. 

Be sure you understand the directions before attempting to answer any questions . 

YOU ARE TO INDICATE ALL YOUR ANSWERS ON THE SEPARATE ANSWER SHEET. No credit 
will be given for anything written on any other paper. After you have decided 
Tdilch of the suggested answers is correct, nark the answer in the space on the 
right. Give only one answer to each question. If you change an answer, be sure 
that all previous narks are erased conpletely. 



EXAMPLE 



1. Chicago is a 



(a) state 
(B) city 

country 




continent 



SAMPLE ANSWER 



1 . 



B 



ERIC 



-81 



91 



/ 



K' 



I 

V-. 

% 

t 






?; 

t 

Y: 



Answer Sheet for Multiple Choice 



1 . 

2 . 

3 . 

U. 



1' 

f 



6 . 



7 . 

8 . 

9 . 

IQi 

11 . 

12 . 

13 . 

Hi. 

15 . 




92 



62 



directions for multiple choice* pick-one 



The test that you are about to take is a little different than most other 
tests you have taken while in the Air Force. You are. ad vised to use your 
l^e effectively and to work as rapidly as you can without losing accuracy. 
Do not waste your time on questions that are too difficult for you. Go on 
to the other questions and come back to the difficult ones later if you can. 

Q^STIONS^ understand THE DIRECTI0K5 BEFORE ATTEMPTING TO ANSWER ANY 

YOU ARE TO INDICATE ALL YOUR ANSWERS ON THE SEPARATE ANSWER SHEET. No credit 
will be given for anything written on any other paper. Road the question 
carefully and read each altemativee After you have decided which of the 
suggested answers is correct, mark that answer on the line on the loft hand 
^do of the answer sheet in the space beside the question number. BE SURE 
THAT YOU MARK THE ANSWER CLEARLY. Give only one answer to each question. 

1^ change your answer, be sure that all previous marks are erased 
couple tely. 

Now, you are asked to indicate how sure you are that the answer you Just 
marked i>as correct. Hiis is to be indicated In the column to the right of 
your first matte. If you are very sure your answer was the correct one, check 
the top lane. If you cannot make such a strong statement about your answer 
but are sure your answer was correct, mark the second lino from the top. If 
you are fairly sure your answer was correct, nark the middle line. If you 
m not very sure of your answer but are not making a coaplete guess, mark 
the fourth line from the top. If each alternative seems equally possible and 
you guess one from these, mark the bottom line. 



EXAMPLE 1 

Chicago is a 

(a) state 

(B) city 

(C) country 

(D) village 

EXAMPLE 1. B 



X very sure 

sure 

fairly sure 
not very sure 
not sure 



In the above exanple the subject was lOQ^ sure that the answer he marked 
was the correct answer. 



EXAMPLE 2 



Rantoul is in what Illinois coiint 7 ? 

(A) Ford 

(B) Chanpaign 

(C) Mercer 

(D) Qreene 

example 2. B 

sure 

X fairly sure 
■ not very sure 
not sure 

In the case above the subject quickly eliminated (C) and (D). He was not 
sure of his choice between the other two possible answers. He know that 
it was either Ford or Chanpaign but did not know which of the two was 
correct. He decided to pick B and indicated his lack of sureness by mark- 
ing the middle line. 

EXAMPLE 3 

Uho is the present Postmaster Qeneral 

(A) Volpo 

(B) Richardson 
(c) ELount 

(D) Hickle 



EXAMPLE 3. C 

sure 



fairly sure 

X not very sure 
not s\ire 



In this case the subject knew that Hickle was the Secretary of the Interior 
and so that could be eliminated. He had heard the other three names on the 
news recently but couldn' t remember their cabinet offices. Per some reason 
Blount's name seemed to stick in his mind and so he chose this as the coraect 
alternative. Since he couldn't really toll the difference between the first 
throe alternatives, ho indicated that ho was not very sure of his answer. 



EXAMPLE k 

The capltol of Illinois is 

(a) Chicago 

(B) SJ)ringfield 

(C) Rahtoul 

(D) Decatur 

EXAMPLE 3» B veiy sure 

X sure 

_________ fairly sure 

not very sure 

______ not sure 

In this case the subject knew that neither Rantoul nor Decatur was the 
Capitol city. He thought that Chicago wight be the capltol, but there was 
only a slim chance of this being correct. He therefore chose (B) as being 
correct and indicated that he was not con^letely sure of his answer by 
checking the second line. 



EXAMHJ; 5 

Webster's NEW COLLEGIATE DICTI0NAR7 has how many pages? 

(A) 1175 

(B) 1352 

(C) 1189 

(D) 217h 



EXAMPLE D very sure 

sure 

fairly sure 

_______ not very sure 

X not sure 



In this case the sidaject didn't know 'tdalch one was the answer. Each 
alternative looked like It was possible. The subject guessed at D being 
the correct answer. 



o 

ERIC 





I 



95 



i 

i 



?>*■ 



r 

I; 

IV: 

<» . 

ti: 

K. 



{: 

r. 

I 

K 

I 






Answer Sheet for Pick-One 

1 . 



2 . 



( 



very sure 
sure 

fairly sure 
not very sure 
not sure 
very sure 
sure 

fairly sure 
not very sura 
not sure 



very sure 
sure 

fairly sure 
not very sure 
not sure 



U* 







j 



j 




very sure 
sure 

fairly sure 
not very sure 
not sure 

very s\u*e 
sure 

fairly sure 
not very sure 

not sure 



- 86 - 



96 



6 



7 . 



8 . 



9 . 



10 . 



o 

ERIC 



- 87 - 97 



very sure 




sure 


i 


fairly sure 


1 


not very sxire 




not sure 


ll 

1 

Vi 

:.s 


very sure 


t 

1 


sure 




fairly sure 


1 

i 


not very sure 


i 

1 

% 

r- 


not sure 


i 

w 


very sure 


1 

'a 

0 


sure 


i 

1 

V 


fairly sure 


i 


not very sure 


1 


not sure 


1 


very sure 
sure 


1 



fairly sure 
not very sure 
not sure 

very sure 
sure 

fairly sure 
not very sure 

not sure 



11 



12 . 



13 . 



Ih. 



15 . 



very sure 

sure 

fairly sure 

not very sure 

not sure 

very sure 

sure 

fairly sure 

not very sure 

not sure 

very sure 

sure 

fairly sure 

not very sure 

not sure 

very sure 

sure 

fairly sure 

not very sure 

not sure 

very sure 

sure 

fairly sure 

not very sure 

not sure 

98 



-88 



DIRECITONS- FOR MULTIPLE CHOICE; DISTRIBUTE 100 POINTS 



The test that you are about to take is a little different than most other 
tests you have taken while in the Air Force. You are advised to use your 
time effectively and to work as rapidly as you can without losing accuracy. 

Do not waste your time on questions that are too difficult for you. Go on 
to the other questions and come back to the difficult ones later if you can. 

Be sure you understand the directions before attempting to answer any 
questions . 

YOU ARE TO INDICATE ALL YOUR ANSWERS ON THE SEPARATE ANSWER SHEET. No credit 
will be given for anything written on any other paper. Read the question care- 
fully and read each alternative. After you have decided which of the suggested 
answers is correct, mark that answer on the line on the left hand side of the 
answer sheet in the space beside the question number. BE SURE THAT YOU MARK THE 
ANSWER CLEARLY. Give only one answer to each question. If you change yo\ir ans- 
wer, be sure that all previous marks are erased conpletely. 

Now you are asked to indicate how sure you are that the answer you gave was 
correct. You are given 100 points to distribute over the possible alternatives. 
You are to distribute the 100 points over the alternatives as they appear on 
the right hand column of your answer sheet. If you are conpletely sure of your 
answer, place your 100 points all on that alternative. If you are making a 
couple te guess, place 2^ points on each alternative. 

The first step is to decide what alternatives are coirpletely wrong and place 
zero points on those alternatives. You should show your confidence in an 
alternative by the number of points you assign to it. The alternative which 
you marked on the left should always have the largest number of points placed 
beside it on the right. You should keep in mind that the points you indicate 
to the right should add up to 100, no more and no less. 



Example 1 

Chicago is a 

(a) state 

(B) city 

(C) country 

(D) village 

EXAMPLE 1 B (A) 

100 (B) 

0- - (C) 

0 ■ (D) 



In the exanple above the subject was 100^ sure that the answer he marked 
was the correct answer. 





89- 

99 



EXAMPLE 2 



Webster's NEW COLLEGIATE DICTIONARY has how many pages? 

(A) 117.'^ 

( B ) 13^2 

(C) 1189 

(D) 117U 

EXAMPLE 2 D 

“2F 



In this case the subject didn't know the answer. As far as he was 
concerned each of the alternatives were equally possible'. He chose 
D as being his response, but this was only a guess. 

EXAMPLE 3. 



(A) 

(B) 

(C) 

(D) 



The capitol of Illinois is 

(A) Chicago 

(B) Springfield 

(C) Eantoul 

(D) Decatur 

EXAMPLE 1 B 25 (A) 

Zm (B) 

o_ (q) 

0_ (D) 

In this case the subject knew that neither Rantoul nor Decatur was the 
capitol for sure. He thought that Chicago might be the capitol but 
there was only a slim chance of that being true. He was pretty sure 
that Springfield was the capitol but did not wajit to place all of his 
confidence in that answer. 

EXAMPLE ij. 

Who is the present Postmaster General? 

(A) Volpe 

(B) Richardson 

(C) Blount 

(D) Hickle 

EXAMPLE U' C 20 

. “2TT 

0 



In the case above the subject knew that Hickle was the Secretary of the 
Interior and therefore not the Postmaster General. He therefore places 
no weight on alternative D. He has heard the other three names recently 



(A) 

(B) 

(C) 

(D) 



100 



- 90 - 



on the radio but does not know for sure which one is the Postmaster General. 
He decides that Blount is probably the correct answer and decides to give 
that alternative 60 points. The other two alternatives appear to be equally 
likely so he gives them each 20 points. 

EXAMPLE 5. 



Rantoul is in what Illinois county? 

(A) Ford 

(B) Cliampaign 

(C) Mercer 

(D) Greene 

EXAMPLE 5 B 



In this case the subject quickly eliminated C and D. He therefore assigned 
them zero points. The choice between A and B was a toss-up. He decided 
to choose B but this was only a guess. He felt that alternative A had the 
same chance of being correct. 



50 (A) 

W~ (B) 
0 (C) 

0 (D) 



Many people wonder how to mark their confidence in terms distributing 
the 100 points so that they will make the highest possible score. This 
test is scored in such a way that you will do your best in the long run 
if you distribute the 100 points as honestly as you possibly can. The 
more points you place on the right answer the higher will be your score. 

Only the number of points placed on the correct answer will be scored. 

If you place all of your points on only one alternative and that alternative 
is wrong, you will not get any score. On the other hand, if you put 2$ 
points on each alternative, you will not receive as high a score as you 
would if you placed more points on the correct answer. 



REMEMBER THIS: THE MORE ACCURATELY YOU DISTRIBUTE THE 100 POINT S. THE 

HIGHER YOU CAN EXPECT YOUR SCORE TO BE . 

ALSO; BE SURE YOUR POINTS ADD TO 100. 



101 



91 - 







Many people wonder how to ma7.ic "their sureness for questions about which they 
are not conpletely certain. As it turns out, the test is scored so that you 
will do yoS- best in the long run if you indicate how sure you were as honestly 
as you possibly can. If you nark your answer as being very sure ^ 
your anraer is wrong, you will got a lower score than you would if you had 
TOrkod not sure, to the other hand, it doesn't pay to bo too 
If you nark an answer as being not sure when in fact it is correct, you g 
a lower score than if you would had you marked very sure. 



jfSmSBER THIS s ^ ITORE ACCURATELY YOU DmiCATE YOOR SURMESS , m HIGHEK YOU 
CAN EXPECT YOUR SCORE TO |E. 



- 92 - 102 





f 



6 . 



7 . 



8 . 



9 . 



10 . 




; ••9l|“ 



A. 

B. 

C. 

D. 

A. 

B. 

C. 

D. 

A. 

B. 
C* 

D. 

A. 

B. 

C. 

D. 

A. 

B. 

0 . 

D. 



,104 






I 

i. 

I 

V- 

J; 

J." 

I 






APPENDIX IV 
Student Attitudes 

Attitudes Towards Testing Form CONI 

Attitudes Towards Testing Form C0N2 

Attitudes Towards Testing Form C0N3 



Page 

97 

98 
102 



I 

! 

i 

I 

j 

1 



I 

\ 



ERIC 



.106 

- 96 - 



ATTITDDES TOWARDS TESTlNa PORN OONl 



1. How well do you think the results of the test showed what you really know? 

A, _____ very satisfactorily 

B, ____ satisfactorily 

C, ______ so-so 

D, _____ not very satisfactorily 

B. ^___ not satisfactorily 

2. To vih&t extent are you satisfied with classroom testing as it helps you 
fihd out what you know and what you don' t know? 

A, ____ to a large extent 

B, _____ to an extent 

C, to some extent 

D, to a little extent 

E, not at all 

3. How satisfied^ in general, are you with the testing that takes place in 
this class? 

A. _____ very satisfied 

B. ______ satisfied 

C. _____ I can take it or leave it 

D. _____ not very satisfied 

E. _____ not satisfied 

U. How advantageous do you find testing in your coarse? 

A, _____ very advantageous 

B, advantageous 

C, ____ doubtful 

D, disadvantageous 

E, very disadvantageous 

IF lOU HAVE EVER BEM IN A REMEDIAL SESSION, ANSWER NUMBERS ^ AND 6, 01HERHISS 
BO NOT ANSWER THEM. 

How satisfied are you with classroom testing as it helps you with 
remediation? 

A, very satisfied 

B, ______ satisfied 

C, _____ so-so 

D, • not very satisfied 

E, ___ not satisfied 

6. How well do your test results help you duz*ing remediation? 

A, ______ very well 

B, well 

C, _____ I can't really say 

D, not very well , 

E, not well 



-97-v. 



107 



ATTITUDES TOWARDS TESTINO FORM C0N2 



1. How well do you think the results of the test showed idiat you really 
know? 

A. very satisfactorily 

B. satisfactorily 

C. ____ so-so 

D. not very satisfactorily 

E. not satisfactorily 

2. To what extent are j'ou satisfied with classroom testing as it helps 
you find out idiat you know and what you don' t know? 

A. to a large extent 

B. ____ to an extent 

C. to some extent 

D. to a little extent 

E. not at all 

3. How satisfied, in general, are you with the testing that takes place in 
this class? 

A. very satisfied 

B. satisfied 

C. I can take it or leave it 

D. not very satisfied 

E. not satisfied 

U, How advantageous do you find testing in your course? 

A. vary advantageous 

B. . advantageous 

C. doubtful 

D. disadvantageous 

E. very disadvantageous 



IF YOU HAVE BEEN TO A REMEDIAL SESSION, ANSWER NUMBERS $ AMD 6, OTHERWISE, 
00 TO NUMBER 7. 

How satisfied are you with classroom testiag as it helps you with 
remediation? 



A, very satisfied 

B, satisfied 

C, so-so 

D, not very satisfied 

E, not satisfied 

6. How well do your test results he!^ you during ronediation? 




A. very well 

B. _____ well 

C. I can't really say 

D. not very well 

E. not well - 98 - 



108 



7 



• How inportant was it for you to score hi^ on your tests? 

A. ______ very inportant 

B. inportant 

C. so-so 

D. not very inportant . 

E. not inportant 

6. How well did you understand how the tests wez*e graded? 

A. _____ very well 

B. _____ well 

C. ______ fairly well 

D. _____ not very well 

E. ______ not at all 

9 ^ How vfell did you understand the ways of narking your answers? 

A. _____ very well 

B. _____ well 

C. _____ fairly well 

D. not very well 

E. _____ not at all 

10. When you cane to a question that you did not know, how accurately 
do you think you narked your confidence? 

A. ■ very accurately 

B. _____ accurately 

C. _______ fairly accurately 

D. not very accurately 

E. _____ not accurately 

11. How easy was it for you to decide how to nark your answers? 

A. _____ very easy 

B. easy 

C. ____ so-so 

D. fairly difficult 

E. difficult 

12. How well do you think yoxir confidence narks showed your real confidence? 

A, ____ very well 

B, well 

C, fairly well 

D, _____ not very well 

E, not at all 

13 • How honest were you in naking your confidence narks? 

A. very honest 

B. ^____ honest 

C. _____ fairly honest 

D. not very honest 

E. not honest 



- 99 - 



109 



/ 

> 

J 

I 

«\ 

I: 

i • 



i 



f 



I 

I 

t 

i 

I 

! 

i 

! 

I 

t 

( 



i. 



[ 

I 

» 

j 

f 

r 

I 



i 

j 

} 

i 

c 

* 

r 

j 

j 

) 

j 

i 

\ 

f 

f 

f 

i 

? 

> 

£ 

I 




ll;. How comfortable did you feel when you took teste where you narked 
your confidence? 

A. very comfortable 

B. comfortable 

C. _____ a little uneasy 

0. uneasy 

E. ___ uncomfortable 

1$. How often did you gamble in answering? 

A. very often 

B. often 

C. a few times 

D. once or t%dce 

E. never 

16. lb idiat extent do you think It was In your best Interest to maxic 
your confidence accurately? 

A. to a large extent 

B. to an extent 

C. to a sli^t extent 

D. to a small extent 

E. _____ not at all 

17 • How well did you like taking multiple choice tests where you picked the 
answer you thou^t was ri^t and then told how sure you were the answer 
was right? 

A. very well 

B. well 

C« _____ so-so 

D. ___ not very well 

E. ____ not well 

18« To what degree has your prior experience in taking tests by moze 
conventional methods affected your use of the testing foocmat that 
you have been jslng in this class? 

A, ___ to a great degree 

B, ____ to a degree 

C* to a sUf^t degree 

D. to a TBiy small degree 

E* ___ not at ^ 

19* Conpared to the usual method of taking a test, how would you rate the 
testing you -have been using as a msthM of releaxnirig? 

A* very well 

B» well 

C* I can* t really say 

D« ___ not very well 
E* not well 



110 



-100- 






20. To idiat extent do you agree or disagree with these strengths and 
weaknesses of the confidence testing systein you have been using? 
Check the appropriate blank for each statement* 





strongly 

agree agree 


don't agree do not 
doubtful that much agree 


It better identifies 
ny strengths and 
weaknesses* 






It allows instructor 
to better reteach 
naterlal* 






It reduces guessing* 






It identifies level 
of knowledge helpful 
to me* 






It is fairer than 
most systems* 






It requires more 
thou^t before making 
a response* 






It is a useful device 
for relearning * 
material* 






It is difficult to 
overcome old test- 
ing habits* 






It tends to make me 
lose confidence in 
selecting one answer* 






It tends to make me 
hedge in answers - 
play safe* 






Uhlnformed students 
try to beat the 
system* 






The amthod encourages 
guessing* 














ERIC 



1 






ATTITUDES TOMARDS TBSTINQ FOBH CON3 



1. How well do you think the results of the test showed idiat you reall 7 know? 

A. very satisfactorily 

B. satisfactorily 

C. ______ so-so 

D. not very satisfactorily 

E. ___ not satisfactorily 

2. Tb what extent are you satisfied with classrooD testing as it helps 
you find out \diat you know cmd what you don't know? 

A. ______ to a large extent 

B. to an extent 

C. _____ to sone extent 

D. to a little extent 

E. not at all 

3. How satisfied, in general, are you with the testing that takes place in 
this class? 

A. ______ very satisfied 

B. ■ satisfied 

C. _____ I can take it or leave it 

D. _____ not very satisfied 

E. _____ not satisfied 

li* How advantageous do you find testing in your course? 

A. very advantageous 

B. advantageous 

C. doubtful 

D. _____ disadvantageous 

E. _____ very disadvantageous 

IF lOU HAVE BEQI TO A REMEDIAL SESSION, ANSWER NUMBERS 5 AND 6, 01HERHISE, 00 
TO NUMBER 7. 

5. How satisfied are you with classroon testing as it helps you with 
reBediatlon? 

A. ___ very satisfied 

B. _____ satisfied 

C. 80-80 

D. not very satisfied 

E. _____ not satisfied 

6. How well do your test results help you during reBediatlon? 

A. very well 

B. well 

C. _____ I can' t really say 

D. ___ not very well 

E. ___ not well 



l'i2 



- 102 - 




%■. 

*,! 

c 



r, 

'I 

r 

L 

t- 

i' 

ii-^ 



j 

i‘ 



o 

ERIC 



7* How ijqportant was It for you to score hl|^ on your teats? 

A, very Inportant 

B, ____ lB|>ortant 

C, _______ so>s6 

D, not very ijqportant 

E, ______ not iii|>ortant 

8, How well did you understand how the tests were graded? 

A, ____ very well 

B, well 

C, ______ fairly well 

D, _____ hot very well 

E, _____ not at all 

9» How well did you understand the ways of narking your answers? 

A. _____ very well 

B. _____ well 

C. fairly well 

D. _____ not very well 

E. not at all 

10. Vben you cane to a question that you did not Imow, how accurately do 
you think you naxSced your confidence? 

A, ____ very accurately 

B, accurately 

C, fairly accurately 

D, _____ not very accurately 

E, _____ not accurately 

U. How easy was it for you to decide how to nark your answers? 

A. very easy 

B. easy 

C. _____ so-so 

D. fairly difficult — 

E. difficult 

12 • How well do you think your confidence narks showed your real confidence? 

A, ____ very well 

B, well 

C, fairly wall 

D, _____ not very well 

E, not at all 

13. How honest were you in naklng your confidence narks? 

A. _____ very honest 

B. honest 

C. _____ fairly honest 

D. not very honest 

E. not honest 



- 103 - 



113 




il' 









111. How comfortable did you feel iihen you took tests uhere you msx^ced 
your confidence? 

A. very comfortable 

B. ______ comfortable 

C. _____ a little uneasy 

D. ______ uneasy 

E. ______ uncomfortable 

15. How often did you gamble In answering? 

A. _____ very often 

B. _____ often 

C. a few times 

D. _____ once or twice 

E. _____ never 

16. To what extent do you think it was in your best interest to mark your 
confidence accurately? 

A, _____ to a large extent 

B, ______ to an extent 

C, _____ to a slight extent 

D, to a small extent 

E, _____ not at all 

17. How difficult was it for you to distribute the 100 points and make sure 
they added to 100? 

A. very difficult 

B. 2IZZ difficult 

C. so-so 

D. not very diflicult 

E. not difficult 

18. How well did you like taking multiple choice tests where you distributed 
100 points over the possible answers? 

A. _____ vary well 

B. well 

C, ____ X can't really say 

D, _____ not very well 
B, _____ not well 

19. To idiat degree has your, prior experience in talcing tests by more conventional 
methods affected your use of the testing format that yon have been using in 
thia class? 

A. to a great degree 

B. to a degree 

C. to a slight degree 

D. to a very small degree 

E. not at all 



-lOli- 



114 



I 





20. Con|>aped to the usual method of talking a test, how would you rate the 
testing you have been using as a method of relearning? 

A. very wall 

B« well 

C. I can't really say 

D. ____ not very well 

E. not well 

21. To what extent do you agree or disagree with these strengths and weaknesses 
of the confidence testing system you have been using? Check appropriate 
blank for each statement. 



strongly 


don't agree 


do not 


agree agree 


doubtful that much 


agree 



It better Identifies 
strengths and 
weaknesses. 



It allows Instructor 
to better reteach 
material. 

It reduces guessing. 

It Identifies level 
of knowledge helpful 
to me. 

It Is fairer than 
most systems. 

It requires more 
thought before making 
a response. 

It is a useful device 
for relearning 
material. 

It is difficult to 
I overcome old test- 
: Ing habits. 

It tends to make mo 
lose confidence in 
selecting one ansit/or. 

It tends to make ms 
hedge in answers > 
play safe. 

Tkilnformed students 
try to beat the 
system. 

The method encourages 
guessing. 



- 105 - 



115 



APPENDIX V 



Detailed Analysis of Block Scores 

Page 



Table 1. Means and Standard Deviations of Block Scores by Type 

of Testing, lype of Remediation, and Shift in AGE 10? 

Table 2. Multivariate Test of the Type of Testing x Shift 

Interaction in AGE Using Wilks* Lambda Criterion 108 

Table 3* Multivariate Test of the Type of Testing Effects 

Within Shift D in AGE 109 

Table U. Multivariate Test of the Type of Testing Effects 

Within Shift C in AGE 109 

Table 5* Multivariate Test of the Type of Testing Effects 

Within Shift B in AGE HO 

Table 6. Multivariate Test of the Type of Testing Effects 

Within Shift A in AGE HO 

Table 7. Means and Standard Deviations of Block Scores by Type 

of Testing, Type of Remediation, and Shift in JEM HI 

Table 8. Multivariate Test of the Type of Testing x Type of 

Remediation x Shift Interaction in JEM Using Wilks* 

Lambda Criterion H2 

Table 9* Multivariate Test of the Type of Testing Effects Within 

Shift B Special Remediation in JEM H3 

Table 10. Multivariate Test of the Type of Testing Effects Within 

Shift A Special Remediation in JEM H3 

Table 11. Multivariate Test of the Type of Testing Effects Within 

Shift B Control Remediation in JEM llli 

Table 12. Multivariate Test of the Type of Testing Effects Within 

Shift A Control Remediation in JEM Hi; 



1 

[. 



lie 

o 

ERIC 



- 106 - 



Table 1 



Means and Standard Deviations of Block Scores by 
Typo of Testing, Type of Remediation, and Shift in AGE 





- 107 - 117 



Table 2 



Multivariate Test of the Type of Testing x Shift Interaction 
in AGE Using Wilks' Lambda Criterion 



Test of Roots 


F 


Degrees of 
Freedom For 
Hypothesis 


Degrees of 
Freedom For 
Error 


p less 
than 


1 Through 3 


2.58U 


18 


U8U.1U6 


0.001 


2 Through 3 


1.891 


10 


U7U.957 


o.oUh 


3 Through 3 


0.062 


h 


1i$6.715 


0.993 



Table 3 



Multivariate Test of the T^pe of Testing 
Effects Within Shift D in AGE 



Tests of significance using Wilks* 


lambda criterion 






Tests of the 
Roots 


F 


Degrees of 
Freedom for 
Hypothesis 


Degrees of 
Freedom for 
Error 


p less 
than 


1 Through 2 


1.769 


6 


3li2.0 


0.105 


2 Through 2 


0.81|8 


2 


171.5 


o.uo 


Univariate F tests 










Variable 


F(2,173) 


Mean 

Square 


p less 
than 


Standardized 

Discriminant 

Coefficients 


Block 6 Score 


0.832 


37.318 


0.U7 


0.623 


Block 7 Score 


0.93U 


75.1i93 


0.395 


0.789 


Block 8 Score 


0.767 


h^.ohQ 


0.1j66 


-l.l61j 



Table 



Multivariate Test of the Type of Testing 
Effects Within Shif o C in AGE 



Tests of significance 


using Wilks* 


lambda criterion 






Tests of the 
Roots 


F 


Degrees of 
FreedOTi for 
Hypothesis 


Degrees of 
Freedom for 
Error 


p less 
than 


1 Through 2 


3.562 


6 


3h2.0 


0.002 


2 Through 2 


0.103 


2 


171.5 


0.902 


Univariate F tests 










Variable 


F(2,173) 


Mean 

Square 


p less 
than 


Standardized 

Discriminant 

Coefficients 


BIock 6 Score 


ii.595 


206.099 


0.011 


1.150 


Block 7 Score 


0.732 


59.178 


0.1|82 


-0.326 


Block 8 Score 


0.736 


li3.229 


0.1|8l 


-0.700 



if 




119 

- 109 - 



Table $ 

Multivariate Test of the T^e of Testing 
Effects Within Shift B in AGE 



Tests of significance using Wilks' lambda criterion 



Tests of the 
Roots 


F 


Degrees of 
Freedom for 
Hypothesis 

6 

2 

Mean 

Sqxiare 


Degrees of 

Freedan for p less 

Error than 


1 Through 2 

2 Through 2 

Univariate F tests 
Variable 


2.h2h 

0.201 

F(2,173) 


3U2.0 

171.5 

p less 
than 


0.026 

0.8i8 

Standardized 

Discriminant 

Coefficients 


Block 6 Score 


0.581 


26.062 


0.560 


0.51*2 


Block 7 Score 


U.307 


3U8.293 


0.015 


-1.11*0 


Block 8 Score 


0.182 


10.671 


O.83U 


O.30I* 



Table 6 

Multivariate Test of the Type of Testing 
Effects Within Shift A in AGE 



Tests of significance using Wilks' lambda criterion 



Tests of the 
Roots 


F 


Degrees of 
Freedom for 
Hypothesis 

6 

2 

Ifean 

Square 


Degrees of 
Freedcmi for 
Error 


p less 
than 


1 Through 2 

2 Through 2 

Univariate F tests 
Variable 


1.259 

0.651 
F( 2 , 173 ) 


31*2.0 

171.5 

p less 
than 


0.275 

0.523 

Standardized 

Discriminant 

Coefficients 


Block 6 Score 


0.71*1* 


33.361 


0 . 1*77 


0.813 


Block 7 Score 


0.686 


55.1*1*8 


0.505 


0.625 


Block 8 Score 


0 . 1 | 1|2 


25.992 


0 . 61*3 


-1.139 



i 

\ 

o I 
ERIC 



- 110 - 



120 



Table 7 



Means and Standard Deviations of Block Scores hy 
Type of Testing, Type of Remediation, and Shift in JBM 



Type of 
Testing 


lype of 
Remediation 


Shift 


N 




Block 2 


Block 3 


M.C. 


Cont. 


A 


18 


Mean 


eiZi 


83.0 










S.D. 


8.1 


7.5 


M.C. 


Cont. 


B 


2h 


ft 


88.5 


89.0 












6.2 


5.9 


M.C. 


Spec. 


A 


6 


ft 


77.0 


81.3 












5.5 


3.6 


M.C. 


Spec. 


B 


8 


ft 


78.9 


77.8 












7.3 


5.3 


P.O. 


Cont. 


A 


26 


It 


91.3 


8ii.2 












5.6 


6.7 


P.O. 


Cont. 


B 


29 


rr 


85.7 


88.0 












7.6 


6.5 


P.O. 


Spec. 


A 


2$ 


rr 


89.6 


87.1 












5.9 


ii.9 


P.O. 


Spec. 


B 


27 


rr 


85.6 


88.8 












7.7 


ii.6 


loop 


Cont. 


A 


27 


rr 


89.8 


88.5 












6.7 


5.8 


loop 


Cont. 


B 


15 


rr 


87.1 


85.1 












8.2 


5.0 


loop 


Spec. 


A 


17 


rr 


81.3 


81.8 












5.9 


6.3 


loop 


Spec. 


B 


32 


rr 


85.8 


83.ii 












7.0 


7.ii 



/ 

I 



f- 

f. 





- 111 - 



Table 8 



Mu Ll.i variate Test of the Type of Testing x Type of Remediation x 
oh 1ft Interaction in JQ1 Using Wilks' Lambda Criterion 







Degrees of 


Degrees of 






Freedom for 


Freedom for 


Test of Roots 


F 


Hypothesis 


Error 


1 through 2 


2.585 


k 


U82.000 


2 through 2 


0.110 


1 


2ia.5oo 





- 112 - 



p less 
than 

0.036 

O. 7 I 1 O 



Table 9 



^ Multivariate Test of the Type of Testing 
Blleots Witliin Shift B Special Remediation in JEM 



Tests of significance using Wilks 


' lambda criterion 






Tests of the 
Roots 


F 


Degrees of 
Freedon for 
Rypothesis 


Degrees of 

Freedom for p less 

Error thAn 


1 Through 2 

2 Through 2 


7.363 

U.878 


h 

1 


1|82.000 

2i4l.500 


0.001 

0.028 


Univariate F tests 
Variable 


H2,2U2) 


Mean 

Square 


p less 
than 


Standardized 
Discriminant 
Coefficients 
1 2 


Block 2 Score 
Block 3 Score 


3 .U 96 

ll.S2h 


165.707 

kh3.lQ5 


0.032 

0.001 


-0.295 1.157 

1.130 - 0.386 






Table 10 






Multivariate Test of the Type of Testing 
Effects Within Shift A Special Remediation in JEM 




Tests of significance using Wilks’ 


lambda criterion 






Tests of the 
Roots 


F 


Degrees of 
Freedom for 
HSTpothesis 


Degrees of 
Freedom for 
Error 


1 

p less 
thcin 


1 Through 2 

2 Through 2 


6.028 

0.610 


k 

1 


1 i 82.000 

2liL.500 


0.001 

0.k36 


Univariate F tests 










Variable 


F(2,2ii2) 


Mean 

Square 


p less 
than 


Standardized 
Discriminant 
Cofiff iciGni* 9 


Block 2 Score 
Block 3 Score 


12.001 

li.686 


m.nh 


O.OOl 

O.OlO 


0.946 

0.093 



ERIC 



- 113 - 



123 






Table 11 



Multivariate Test of the l^e of Testing 
EU'fects Within Shift B Control Remediation in JEM 



Tests of significance using Wilks* 'lambda criterion 



Tests of the 

Roots F 

1 Through 2 1.902 

2 Through 2 2.105 



Degrees of 
Fi'eedom for 
Hypothesis 

h 

1 



Degrees of 
Freedom for 
Error 

kQ2.000 

2/1I.5OO 



p less 
than 

0.109 

0.1U8 



Univariate F tests 



Variable F{2,2k2) 

Block 2 Score 1.092 

Block 3 Score 2.015 



Mean 

Square 

51.712 

lk-Q99 



Standardized 
Discriminant 
p less Coefficients 

than 1 2 

0.337 -0.792 0.89U 



0.136 1.182 0.175 



Table 12 

Multivariate Test of the Type of Testing 
Effects Vithin Shift A Control Remediation in JEM 



Tests of significance using Wilks' lambda criterion 



Tests of the 
Roots 


F 


Degrees of 
Freedom for 
Hypothesis 


Degrees of 
Freedom for 
Error 


p less j 
than / 


1 Through 2 


5.89U 


h 


kQ2,000 


0.001 1 


2 Through 2 


9.892 


1 


2 UI. 50 O 


0.002 / 


Uni\'ariate F tests 
Variable 


F(2,2I|2) 

5.1i50 


Mean 

Square 

258.176 


p less 
than 


Standardized 
Discriminant 
Coefficients 
1 2 


Block 2 Score 


0.005 


• 1.069 


0.533 


Block 3 Score 


5.3liO 


198.1i77 


0.005 


1.031 


0.601* 



APPENDIX VI 



Frequency Distributions of Responses to 
Student- Attitude Questionnaires 



Acce-idljc VT (Continued) 







r- O rH 



vO r-mrnrH 



On CO r- O O 



m rH m 



f^lA CM iH 



On iH CM 







% 






qS 0) 

-II 



+> +> 
<D O O O 

t» 3 M a c 



•< n o Q n 





*Indi cates significant chi-square at .05 level in J©!. 




lAcn 
^ rH .• 



-“JvO 0\CO C\J 
CV) r^H 



Os C\J O 



^lA O rH 



VO HGO OWA 

H fA CVi 



O O VO o\-cr 
CJ CA CVJ 



CA \0 



^ lA Cvi 









Q) s 5 

0) n) o n) ^ 
> Q) W *0 



«J{ CP O Q M 



_rt 

- 

rH Q) r-f 

9 » 
gi^»° 

> » Cm a B 
<< CP O CP CP 



CNJ 



0\ O O CA 
CM CA Al 



O 0\ \0 rA I 

rACA rH 



r— r-i oA cvj 



o\ vO 0\ r-» H 






fo 

•P 0) 
w a 
o o 

w Q "f to 

§ •'1 

«; c3p u CP H 



M -3 CM ' 



o q\ _-j H m 

C\J -B H 



rH r-f 



CO 



' \A O CM 



I 



^■s 



0) 



S? 

> o cd 



< m o Q M 



•-d 






-square at .OS level in jhi. 




8 



TJ 

Q) 

-p 

n 

o 

a 



n> 

Di 



§1 



CP 

C-* 

§ 

a 

Eh 






W| 

ral 

<i 



CP 

I 

a 

Eh 



OwO CO 
H r»^ CNJ 



CM CNJ \A O O 
CM CM H 



EP 

55 

O 

S 

a 



0\ X/\ vO X/\ 



vOOO Ok H CM 



-P C 
CO 0) 
0 ) 0 ) 
-p ^ 

OJ 0? 

5 






3 



s 

I 

t=> 

cy 



•O -p s 

O CO (d 
•C 0) 0) 
53 4^ H 
0) 0) 

^ s ^ 

jC 

H -P ^ 
m O 

d 0) 

2 -P TJ 

cd o 

0) ^ 5 

i 

•P TJ 

'O'^ S 

0) o 

S * “ 






§ 



CO 



cd o> 

Q> Se 

1- P 0) 



o 



u *5 

O (0 
43 3 



...,_+»+> 
0) fl) O o 

> M fj c 



««5m o Q M 



On 



0) 












to 

4} 



^^OUMA 



CM GO O n^-cy 
H m rH 



rn O -zy vo 

m (M H rH 



H -51 r^ 



UN^ \A 
CM rH 



CM O 



c^ rnO-:y CM 

CM CM H CM 



XA CA CAvO CA 



^H fH 



c- 



rn 



^NOOlA Nt'OXArHrn H lA H w vQ 



0) 

0) 

cd 



jC 

o 



p 

J§ 

p 

0) 

d) 0) 



g 



Q> 



(d 



Q> ^ ^ 

a, g g o 

CO cd TJ 



i 

0) 

p 



45 

o 

cd 

0) 

p 

0) 

u 

u 

0) 

p 

p 

o 

p 

o 

p 

0 

1 

•S 

CO 

§ 



4:5 

o 



0 ) 

0 ) 

nl 



< 1 ; cq o C3 W 



131 



3 



P 

JS 

p 

0 ) 

Q> Q> 

bb C4 
cd 

p 

0 ) 43 43 o 
3 II g o 

CO (d 'd 'O TJ 



•f 

CO 

CO 

0 ) 

& 



0 ) 

0 ) 

& 

cd 



p <5 pq o Q w 



H 

CM 



CO 

0) 

o 



p 

M 



CM 

CM 



P 
P 

0 ) 

0 ) 0 ) 

Qrt 
H p 



o 0) . 

■5 feg g o 

CO (d *d xJ 



<3 m o P M 



Appendix VI (Continued) 




8 



M 

H 



a 

a 

a 



XA XA^O CA r- 
rH C\J rH CVJ 



IfJXA^CVJ 
<M rAiH iH 



XAvO C^OO rH 

CVJ CVJ oj 



vO OA C^ 
rH C\J rH CVJ 



O 

g 

a 



vO CA CVJ CO 0\ 
rH CM rH CM 



CA CM vO O 
CM CM rH CM 



rH H O O 
CM CA rH CM 



vO O O CM O 
rH CM rH -Ct 



rH CM C- vO 



c- O XAXA ^ 



'OvOC^-:J\A 



T> 




0 ) 






1 M 


.5 


1 ^ 


•rf 

43 


< 


Cl 




0 




0 





'O 

a 

n> 



H 

§ 

H 

04 



s 

Eh 

8 



O XA vO so CO 



0 ) 

o 

fj 

0 ) 

'O 

•r I 

'll 

o 

O 

at 

fii 

o • 

rH ^ 

0 ) 

0 ) ^ 
6 g 

a c 

o ° 

43 ^ 

w .H 

C o 
0 ) 0 ) 
43 H 

43 CO 



0 ) 

0 ) 

0 ) 
g 

^ , . 

w rt 73 



•c: 

o 



fd 

43 

0 > 



0 ) 

0 ) 

cd 



Q> 



5 ‘^40 

s 

^ a ^ 
o 
73 



O 



<J « O P M 



D 4 



(0 

0 ) 



(d 

•M 

0 ) 

bO 

0 ) 

M3 

0 ) 

a 

0 ) 



o 

43 

(0 

Ti 

a • 

O 0) 
43 (0 



rH sO CO so 



0 ) 



0 

1 

43 

I 



(d 



0 ) 

0 ) 

cd 



cd 

43 fc) § o o 
w cd TJ T3 TJ 



<j pq o p M 



CM O vO-cr CA 



§ 

43 

CO 

& 

0 ) 

J3 

43 

43 

cd 

0 ) 

o 

43 

43 

CO 

“S 

0 ) 

I 

I 

o 



;§ 



n> 

I 






0 ) 
0 ) 0 ) 



s 

I& 

•P *8 



o Sjgf fi 

p & 1 ) B B o 

CO cd xj »d 



<j p o p w 



rH CA GO vOC- 



bO 

CO 

(0 

0 ) 

S) 

(0 

Q> 

bO 



o 

o 

a 

0 ) 

xi 

o 

0 ) 

a 

0 ) 

•C 

EH 



0 ) 

0 ) 

cd 



41 

O 



43 

43 

0 ) 



0 ) 

0 ) 



r 

43 

CO 



I& 

cd 



^ g g 

cd xJ xj 



43 

§ 

o 

xd 



^ m o p M 



CO 

CM 



On 

CM 



-123 



O 

CA 



l .?3 



CA 



APPENDIX VII 



Instructor Questionnaires 






134 



INSTRUCTOR INTERVIEW 



1. Instructor Name: 

2. Instructor class: 

3* Do you look at your students* test results? If so, what do you look for? 



No 

1 



Yes 

36 



Items Missed 
3h 



Total Scores 
3 

1:. Do you look at the questions that each student misses? 



No 

h 



Yds 

33 



5. Do you correct the tests yourself? 



No 

3 



Yes 

3h 



6. Do you use a total score in assigning students to remediation or can you 
use responses to particular questions? 

Total score 37 

7. How do you detennino the instruction in the remedial classes? 

By questions missed Miscellaneous 
30 7 

8. How far ahead are students scheduled for remediation? 



Daily 

33 



Other 

h 



9- Are any people required to attend remediation for reasons other than 
poor tost results? If so, for what other reasons are peopD.e assigned 
to remediation? ^ ^ 



No 

10 



Yes 

27 



135 



-125- 



10. Do 6S the ettendence vary according to individuals problems with partlcuXar 
subjects or are people assigned to remedial sequences regularly until they 
have improved? 

Assigned regularly Both Attendance varies 

7 7 23 

11 • Have you had any Insti^uction in how to use the test results in making 
remedial sessions? 

No Yes 

18 19 



FOR INSTRUCTORS OF GLASSES USING CONFIDENCE TESTING 

12. How many people were you able to lind who placed a large amount of 
confidence in wrong answers? 

Few About half Many 

30 h 3 

13. How often, approximately, did students place all their confidence in 
their responses? 

Seldom Sometimes Very often 

2 2 33 

II4. Did you treat students vrtio placed large amounts- of confidence in wrong 
answers any differently? 

No Yes 

30 7 

1$. Were separate remediation sessions set up for both students who placed 
large amounts of confidence in wrong answers and students who did not 
know the answer as evidenced l?y placing small amounts of confidence in 
their preferred answer? 

No Yes 

37 0 

16. Did the confidence test scores influence youi* decisions about the 
remediation? 



No Yes 

31 6 



1H6 



- 126 - 



17. What difficulties did you have in inteipreting the confidence scores? 

Some Grading Couldn't inteirret 

26 9 2 

18. Were students pressed for time when taking the confidence test? 

No Yes 

36 1 

19. Did students have much difficulty in assigning their confidence? 

No Yes 

36 1 



137 



-127- 



Unclassified 



Security Classification 



DOCUMENT CONTROL DATA • R & D 

(Sfcurltr tietslllcallon ot title, body ot nbtirnci nnd Indexing „„„ot„tlon nw,l be entered when Ihe overatt rtporl l» tins 



aUlod^ 



I. ORIGINA TING ACTIVITY (Corporate author) 

Educational Testing Service 
Princeton, New Jersey 085 40 



3. REPORT TITLE 



2«. REPORT SECURITY CLASSIFICATION 



26. GROUP 



AN EVALUATION OF THE FEASIBILITY OF CONFIDENCE TESTING AS A DIAGNOSTIC AID IN TECHNICAL 
TRAINING 



4 . DESCRIPTIVE NOTES (Type ol report and Inclutive deles) 

Final Report (July 1970 to July 1971) 



AU TMO R(SI fF/Mf namo» middle /n/f/a/f last name) 

Gary J. Echternacht 
Wayne S.Sellman 
Robert F. Boldt 

Joseph D, Young 

REPORT DATE 

July 1971 



7a. TOTAL NO. OF PAGES 

127 



76. NO. OF REFS 
22 



8a, CONTRACT OR GRANT NO. 

F41609-70-C-0044 



6. proje:c t no. 1121 

Task No. 112103 
d. Work Unit No. 112103003 



9o« O RIGINATOR'S report NUMOER(S) 

AFHRL-TR-71-33 



f/i 7a^ repo?/) NO(S) (Any other numbers that may be aaaigned 



10. DISTRIBUTION STATEMENT 

Approved for public release; distribution unlimited. 




II. SUPPLEMENTARY NOTES 


12. SPONSORING MILITARY ACTIVITY I 

Technical Training Division I 

Lowry Air Force Base, Colorado 80230 1 





This report describes a study to determine the feasibility and the cost-effectiveness of using confidence testing as 
a dia^ostic aid in technical training programs. Two types of confidence testing, Pick-One and Distribute 100 Points 
were developed for comparison to conventional multiple-choice testing. The Jtudy was carried out in two technical 
training courses, Aerospace Ground Equipment Repairman (AGE) and Jet Engine Mechanic (JEM), currently being 
taught at Chanute Air Force Base, Illinois. The criteria for feasibility included end of block examination scores, number 
ot student remediational sessions, and both student and instructor attitudes. In addition, the relationship of various 
personality variables to confidence test scores was examined for both types of confidence testing. The major finding 
was Uiat w^ile scoring was somewhat more time consuming, end of block examination scores improved slightly and the 
number of remediations required declined slightly when either confidence testing metliod was employed. Other areas of 
in'^estigation produced essentially null results. 



DD 



FORM 

t NOV 68 



1473 



138 



Unclassified 



Security Classification 



Unclassified 



Security Classification 



1 4 . 

K EY WORDS 


LINK A 


LINK B 


LINK C 1 


ROLE 


1 WT 


ROLE 


WT 


ROLE 


WT 1 


Psychological testing 
psychometrics 
confidence testing 
subjective probability 
technical training 















Unclassified 



1.39 



. Security Classification 



